Managing .csv data

The Building Height Attribute .csv contains the attribute OS_TOPO_TOID, which is the topographic identifier (TOID) from OS MasterMap Topography Layer. This TOID is the attribute that is used to join the data to your OS MasterMap Topography Layer data. More information about TOIDs is available in the OS MasterMap Topography Layer Product Guide and the product's Technical Specification.

Checking TOIDs

TOIDs are stored in GML as a character string, with the prefix ‘osgb’, and can be comprised of either a 13-digit integer or a 16-digit integer after ‘osgb’ (for example, osgb1000001799480255). Ordnance Survey recommends that TOIDs are stored in this format.

The changing of a TOID format means that you will not be able to confidently match features with other Ordnance Survey products or with datasets created by third parties using TOIDs or OpenTOIDs as the key feature identifier. We would therefore recommend that, except where absolutely necessary, you do not make changes to the identifier format.

Some of the methods available to translate OS MasterMap Topography Layer from GML alter the TOID from the format in which it is supplied in the GML. We have tested a small sample of these methods to ascertain how the TOID is dealt with during some translation processes. The following table details how some of the translators we tested deal with TOIDs; please look at your own data to investigate how your translation process will affect the TOIDs in your data.

TOID processing results for four translators.

Translator
Processing of TOIDs

Miso InterpOSe

Removes ‘osgb’ from the beginning of TOIDs and adds three leading zeroes to 13-digit TOIDs.

Astun Loader

Preserves raw GML TOID format but renames column from TOID to fid.

Pitney Bowes MapInfo

Removes ‘osgb’ from the beginning of TOIDs.

QGIS

Preserves raw GML TOID format.

It is recommended that the OS_TOPO_TOID attribute in the OS MasterMap Topography Layer – Building Height Attribute .csv files be edited if it is found that the OS MasterMap Topography Layer TOID has been altered during the translation process, although how TOIDs are managed is up to your discretion. To aid this process, we have created some script that can be run alongside a gawk application that will alter TOIDs in the .csv files in two different ways. It is recommended that you create a backup copy of your .csv file before you make any changes to it.

Databases

If your TOID column requires altering the steps to do so are outlined in the importing data into a database page - detailing how this can be achieved in a database environment. This can also be achieved using the gawk process outlined below.

Reformatting the TOID attribute

There are several options which will allow the user to reformat the TOID attribute before joining it with the TOID field in OS MasterMap Topography Layer.

For smaller amounts of data, the simplest option is to use the free open-source text editor called Notepad++. Using this text editor, simply load up the Building Height Attribute .csv data and then, using the search and replace option, simply replace the 'osgb' part of the TOID name with either nothing or three leading zeroes ('000'). Remember to save the file with a new name once this procedure has been carried out.

The following sub-sections detail other steps that you can take to reformat the TOID attribute in the Building Height Attribute .csv, so that the format matches that of the TOID attribute in your translated

OS MasterMap Topography Layer, Topographic Area data. The reformatting of TOID is a very quick option which will take five minutes to set up and process on a national set of data for OS MasterMap Topography Layer – Building Height Attribute.

If the format of the TOIDs in your translated OS MasterMap Topography Layer data matches that in the Building Height Attribute .csv, please skip to loading data into a GIS.

Two script reformatting options are presented below:

  • Script to remove ‘osgb’ from Building Height Attribute TOIDs: Recommended for use if your translator removes ‘osgb’ from your OS MasterMap Topography Layer TOIDs but does not add any leading zeroes to shorter 13-digit TOIDs.

  • Script to remove ‘osgb’ and add ‘000’ to 13-digit Building Height Attribute TOIDs: Recommended for use if your translator removes ‘osgb’ from your OS MasterMap Topography Layer TOIDs and adds three leading zeroes to shorter 13-digit TOIDs.

In order to use either reformatting fix, you will need to download gawk, a tool that uses pattern-matching principles to do simple data re-formatting tasks.

  1. Download the gawk tool and save the zip file in any location.

  2. Extract the ‘gawk.exe’ file to a directory containing the Building Height Attribute .csv file you wish to alter (this must contain only the .csv to alter and no other files). There must be no spaces in the entire file path to that directory as this will cause the process to fail.

  3. Both reformatting processes require you to create an .awk file and a .bat file alongside the gawk application. These need to be saved in the same directory as the gawk application and Building Height Attribute .csv file you wish to alter. The structure of the .awk file specified in the following sub-section remains unchanged regardless of which of the two script reformatting options you are using. Unlike the .awk file, the content of the .bat file is different for both reformatting options. The format to use in both instances is described after the .awk file detail below.

Generating the .awk file

  1. Copy and paste the following code into a new text editor document (such as Notepad++):

# COMMAND LINE gawk -v leadingZeroes=TRUE|FALSE -f bha_fixTOID.awk infile > outfile

BEGIN {
    FS = ","
    OFS = ","
}

{
    sub("osgb","",$1)
    if (leadingZeroes == "TRUE")
        $1 = sprintf("%016d",$1)
    print
}
  1. Save the file as bha_fixTOID.awk in the same directory as the gawk application and Building Height Attribute .csv file.

This code refers to the file name of this .awk file. If you choose to name your .awk file differently to our suggested name, please change the code to reflect the name of the .awk

Generating the .bat file

Script to remove ‘osgb’ from TOIDs:

If using the first reformatting option (i.e. Script to remove 'osgb' from TOIDs), please copy the following text into a new text editor document and save the document with a .bat extension in the same directory as the gawk application, .awk file and .csv file:

// Some code@rem --- Line below to remove OS_TOPO_TOID "osgb" prefix only
FOR /F "tokens=*" %%A IN ('dir *.csv /s/b') DO (gawk -v leadingZeroes=FALSE -f bha_fixTOID.awk
%%A > _%%~NA%%~XA)
pause
exit

This code refers to the file name of this .awk file. If you choose to name your .awk file differently to our suggested name, please change the code to reflect the name of the .awk file you are using. The .bat file does not have to have a specific name, only the .awk file has to have a specific name because the .bat file refers to the name of the .awk file.

Script to remove ‘osgb’ and add ‘000’ to 13-digit Building Height Attribute TOIDs:

If using the second reformatting option (i.e. Script to remove ‘osgb’ and add ‘000’ to 13-digit Building Height Attribute TOIDs), please copy the following text into a new text editor document and save the document with a .bat extension in the same directory as the gawk application, .awk file and .csv file:

@rem --- Line below to remove OS_TOPO_TOID "osgb" prefix + add leading zeroes
FOR /F "tokens=*" %%A IN ('dir *.csv /s/b') DO (gawk -v leadingZeroes=TRUE -f bha_fixTOID.awk
%%A > _%%~NA%%~XA)
pause
exit

This code refers to the file name of this .awk file. If you choose to name your .awk file differently to our suggested name, please change the code to reflect the name of the .awk file you are using. The .bat file does not have to have a specific name. Only the .awk file has to have a specific name because the .bat file refers to the name of the .awk file.

Executing the reformatting operation

  1. For both reformatting options, navigate to the directory where the .bat file is saved and execute it by double clicking the file. An MS-DOS window will appear.

  2. Once the process is complete, press the return key to close the window.

The directory should now contain a new Building Height Attribute .csv file with the same name as the original but starting with an underscore. This .csv file will contain a copy of the original with the TOID attribute altered.

If you wish to check how this process has altered the TOID attribute, do not open the .csv file with Microsoft Excel because Excel will reformat the TOID and read it as a number field. Instead, open the .csv file in a text editor as this will keep the TOID correctly formatted.

Merging .csv files

It is likely that the user will wish to merge 5 X 5km2 tiles together to create an area of interest before loading into a GI application. The following procedure can be followed to undertake this operation:

  1. Place all the .csv files into an empty folder.

  2. Copy and paste the following script into a text editor (such as Notepad++), and, once copied, give the file a name which explains its function. In this case, we have called the file mergedBHAdata.bat.

copy *.csv mergedBHAdata.csv

Navigate to the directory where the .bat file is saved and execute by double clicking the file. This will create a new file called mergedBHAdata.csv from all the component .csv files in the folder.

  1. Delete the original 5 X 5km2 from the folder, leaving just the newly created mergedBHAdata.csv file in the folder.

  2. Copy and paste the following script into a text editor (such as Notepad++), and, once copied, give the file a name which explains its function. In this example, we have called the file Append_BHA_Header.bat.

copy BHA_Header.csv+ mergedBHAdata.csv BHA_Data.csv
  1. Download the Building Height Attribute header file from the 'Links and downloads' section of the OS MasterMap Topography Layer Product Support page on the OS website.

  1. Place the downloaded header file for Building Height Attribute data into the same folder as the merged Building Height Attribute data file, along with the batch file that was created in Step 4.

  2. Run the batch file. A new merged .csv file containing the header will be created called BHA_Data.csv.

Last updated

#345: Adding What's next networking links

Change request updated