Working with CSV data

Preparing the CSV data

These instructions describe how to prepare the CSV format of AddressBase, AddressBase Plus and AddressBase Plus Islands data for processing.

Downloading header files

AddressBase and AddressBase Plus contain different attributes. This means that there is a separate header file for each of product. Download the file that matches your product using the links below. You will use this file in the Appending a header file to the CSV section.

Merging multiple CSV files
  1. Unzip all the CSV files into a single folder. Ensure there are no spaces in your chosen folder path, for example: C:\AddressBase_Data or C:\AddressBase_Plus_Data or C:\AddressBase_Plus_Islands_Data.

  2. We recommend merging all the CSV files together to save time importing individual files. You can do this manually using a text editor such as Notepad or TextPad, but it is much faster to use a .bat batch file or an MS-DOS command as described below.

  3. To use the .bat batch function, copy the following text and paste it into a new Notepad document: copy *.csv mergedABdata.csv In this example, mergedABdata.csv is the output name of the merged file which will be created, but this can be any user-defined filename with the extension .csv.

  4. Save the Notepad document with the file extension .bat (for example, mergedABdata.bat) in the same directory as the CSV files unzipped previously (for example, C:\AddressBase_Data or C:\AddressBase_Plus_Data).

  5. Close the .bat file and navigate to the directory where you just saved it. Double-click on the .bat file (for example, mergedABdata.bat) and an MS-DOS window will run. Once the process is complete, the MS-DOS screen will close automatically.

  6. If you look in the directory containing the AddressBase CSV files and batch file, you’ll see that there is now an additional single file called mergedABdata.csv (or the user-defined filename you picked when creating your batch file).

Appending a header file to the CSV
  1. Download and save the appropriate product CSV header file into the same folder as the merged AddressBase.csv file created in Merging multiple CSV files.

  2. For AddressBase data, copy the relevant text below and paste it into a new Notepad document: copy addressbase-header.csv+ mergedABdata.csv AB_Data.csv For AddressBase Plus and AddressBase Plus Islands data, copy the relevant text below and paste it into a new Notepad document: copy addressbase-plus-header.csv+ mergedAB_Plusdata.csv AB_Plus_Data.csv copy addressbase-plus-header.csv+ mergedAB_Plus_Islands_data.csv AB_Plus_Islands_Data.csv These examples use the name mergedABdata.csv or mergedAB_Plusdata.csv as the file that contains the AddressBase data merged into a single CSV file created above. If you have named this something else, amend that text above accordingly. The order that the documents are referred to in the above text is also important as it states which file is appended to the other. In this instance, the headers CSV file comes first so that the column headers are the first line of the final AddressBase file and the merged data is appended to the column headers.

  3. Save the above Notepad document with the file extension .bat (for example, append.bat) in the same directory as the column headers and the merged AddressBase data (for example, C:\AddressBase_Data or C:\AddressBase_Plus_Data or C:\AddressBase_Plus_Islands_Data).

  4. Close the .bat file and navigate to the directory where it was saved to (for example, C:\AddressBase_Data or C:\AddressBase_Plus_Data). Double-click on the new .bat file (for example, append.bat) and an MS-DOS window will open. Once the process is complete, the MS-DOS screen will close automatically.

  5. Navigate to the directory where the column headers and the merged AddressBase data are located. You will see that a new CSV file has been created, which is the merged column headers and AddressBase data (for example, AddressBase.csv or AddressBase_Plus.csv).

Loading CSV into GIS software

These instructions describe how to load the CSV format of AddressBase, AddressBase Plus and AddressBase Plus Islands data. In these examples, AddressBase Plus data will be used to describe the procedures in various GI systems.

It is assumed that the preparation of the AddressBase, AddressBase Plus or AddressBase Plus Islands CSV data has been carried out as instructed in Preparing the CSV data before attempting to load the data. If it has not been done, the full set of data will not load, and data loaded will not contain header information.

AddressBase, AddressBase Plus and AddressBase Plus Islands are also available from Ordnance Survey as a supply in GML format. Loading GML into most GIS applications requires the use of third-party translation software, which is not covered in this guide. If more information is required in the loading of GML format, please contact Ordnance Survey.

Loading CSV into ArcGIS Pro

Note - These instructions are based on ArcGIS Pro version 2.3.3.

Note - When using CSV data in ArcGIS Pro, it is necessary to have column headings. Please ensure that headings have already been prepared as instructed Preparing the CSV data.

  1. Launch ArcGIS Pro and start a new blank project.

  2. Select a folder to save the project to.

  3. Name your project and click OK. The project will then be created. Note - ArcGIS Pro automatically creates a new File Geodatabase (.gdb) within the project folder created. This is different to the creation process in the older ESRI application ArcMap.

  4. You can add a backdrop map for contextual purposes from the available backdrop maps supplied by ESRI or add one of your own from a different File Geodatabase. In this example, we have added a light grey backdrop map canvas supplied by ESRI.

  5. Open the Catalog pane on the right-hand side of the window and expand the listing to see the File Geodatabase created with the project.

  6. To import the AddressBase or AddressBase Plus data, right-click the File Geodatabase, then select Import and from that sub-menu, select Table. A new Geoprocessing window will display in the right-hand pane.

  7. Click the folder icon on the right-hand side of the Input Rows field. A new dialog will open.

  8. Navigate to the location with the merged AddressBase or AddressBase Plus CSV file with the appended headers that you created in Preparing the CSV data. Select the file and click OK.

  9. Back in the Geoprocessing window, type a name in the Output Name field, then click Run at the bottom of the window.

  10. Once the process has run, a green box will display at the bottom of the Geoprocessing window and the new AddressBase table will be listed in the left-hand panel.

    The data has loaded as a non-geometry table.

  11. To make the data visible against the mapping backdrop, the XY Coordinate fields need to be specified.

    • In the Contents pane, right-click the AB_Plus table (or the output name you chose) and in the dropdown click Display XY data.

  12. In the Geoprocessing window, the XY Table To Point parameters will be displayed.

    • Using the dropdown options, change the X Field to X_COORDINATE or Longitude and the Y Field to Y_COORDINATE or Latitude.

    • Then select Projected Co-ordinate Systems > National Grids > Europe > British National Grid. Note – If you selected X and Y as Longitude and Latitude in the step above, then you need to select ETRS89 [EPSG: 4258] instead.

  13. Click Run.

  14. Once the process has run, a green box will appear at the bottom of the Geoprocessing window and the output XYTableToPoint map layer should appear ticked on the left-hand Contents pane. In the Map window, the addresses will now be displayed as point features.

    You have now successfully loaded the data in ArcGIS Pro.

Loading CSV into ArcGIS Desktop

Note - These instructions are based on ArcGIS Desktop versions 9.3 and 10.

Note - When using CSV data in ArcGIS , it is necessary to have column headings. Please ensure that headings have already been prepared as instructed Preparing the CSV data.

  1. Launch ArcCatalog as a separate program, or within ArcMap if you are using version 10.

  2. Connect to a folder where the AddressBase data you wish to use can be accessed, for example, C:\AddressBase_Data or C:\AddressBase_Plus_Data. To do this:

    • Click File, or select Folder Connections if you are using version 10.

    • Click Connect Folder, or in version 10, right-click on Folder connections > Connect Folder and navigate to the relevant folder.

    • From the main window, select the folder to connect to and click OK.

  3. The folder should now appear in the navigation window to the left of the screen, or within your Catalog window if you have opened it within ArcGIS Map.

  4. Create a File Geodatabase to store the address data. Using the file tree, go to folder connections and navigate to the directory where you wish to create the File Geodatabase, for example: C:\AddressBase_Geodatabase\AddressBase_Plus. This may need to be set up as a new connection as per the above.

  5. Right-click on the folder where you the File Geodatabase should to be contained, then select New and File Geodatabase.

  6. A File Geodatabase will be created and named by default as New File Geodatabase. Rename the File Geodatabase to a name of your choice.

  7. Right-click on your new File Geodatabase, and select Import > Table (single)…

    • For Input Rows, navigate to the location of the CSV data file that contains the merged header and AddressBase or AddressBase Plus data file.

    • The Output Location should automatically populate with the location of the File Geodatabase that is to be updated; this should be the File Geodatabase you created above.

    • Insert a relevant name for the Output Table, for example: AddressBase_data. Ensure that there are no spaces in the table name. This name will appear under your geodatabase.

  8. Click OK.

  9. To create a map of the locations of the AddressBase records, they need to be geocoded.

    • Right-click on the AddressBase table in the geodatabase that you have just created and select Create Feature Class.

    • In the XY Table… window, you can use the dropdowns to change the X Field to either X_COORDINATE or Longitude, and the Y Field to Y_COORDINATE or Latitude.

    • Click on the Input Coordinates icon and navigate to Projected Co-ordinate Systems > National Grids > Europe > British National Grid. Note – If you selected X and Y as Longitude and Latitude in the step above, then you need to select ETRS89 [EPSG: 4258] instead.

  10. Double-click on the chosen Coordinate System, then click Apply and OK.

  11. Click on the folder icon alongside the Output field and navigate to the File Geodatabase you just created above. If you cannot see the File Geodatabase, ensure that the Save as type box at the bottom of the dialog box is set to File and Personal Geodatabase feature classes.

  12. Type in a name for it and click Save.

  13. Leave the Configuration keyword dropdown menu as DEFAULTS. Click OK. Note – You may need to right-click on the Personal Geodatabase where it was saved and select Refresh in order to see your points. At this stage, if you have completed the steps above in ArcCatalog and not within ArcMap, please continue to follow the steps below. Otherwise, if you have been using version 10 with the catalog inside ArcMap, the data can now be loaded into ArcMap.

  14. In ArcMap, select File > Add Data and navigate to the folder where the File Geodatabase was created above.

  15. Double-click on the File Geodatabase to open it, then select all the files inside.

  16. Click Add.

  17. Once the data has been loaded into ArcMap, you may wish to display more than the ESRI-defined Object ID in the Info tool. To change this:

    • Double-click on the spatial dataset.

    • Select the Fields tab.

    • Change the Primary Display Field to your desired field, for example, UPRN.

Loading CSV into MapInfo Pro

Note - These instructions are based on MapInfo Pro version 12.

Note – MapInfo has a size limit of 2Gb on each table. This equates to a maximum number of approximately 4 million AddressBase records.

Note - When using CSV data in MapInfo, it is not a critical requirement to have column headings. However, for ease of use we recommend using the headings supplied by Ordnance Survey. Instructions on how to merge the data and append the header files can be found in Preparing the CSV data.

  1. Launch MapInfo.

  2. Cancel the Quick Start prompt.

  3. Click File > Open and navigate to the folder that contains the AddressBase data.

  4. In the Files of Type dropdown menu, select Comma delimited CSV (*.csv), then click on the AddressBase data to be loaded. Click Open.

  5. In the next window, tick the Use First Line for Column Titles box and select the character set INSERT CHARACTER SET. Click OK. Note – When adding data this way, the field type classifications and field sizes of each column automatically try to fit the type of data that MapInfo believes is contained within the column and the largest value of that classification found within that column. This means that the classifications and field sizes of some attributes may not match the field types and sizes stated in the Technical Specification. The following instructions outline how to change these columns to match those values:

  6. Select File > Save Copy As… and select the AddressBase table that was loaded. Select Save As… and name the table to be created, then click Save.

  7. Open the table that was just created via File > Open. Navigate to and select the copy of the table you just named. Click Open.

  8. Navigate to Table > Maintenance > Table Structure and select the table to be edited. Click OK.

  9. Here you can change the Type and Width of each attribute to match the ones stated in the technical specification:

  10. Type and Width should be changed for all attributes, apart from the following (due to software-specific dependencies):

    • UPRN should be classified as Float.

    • All attributes that have a Field Type of Date in the technical specification should be classified as Character with a length of 10.

  11. After all changes have been made, click OK.

  12. To create a map of the location of the AddressBase records, they need to be geocoded:

    • Ensure the table of AddressBase records that you wish to geocode is open, then navigate to Table > Create Points.

    • Select the table you wish to geocode from the Create Points for Table dropdown menu.

    • Expand the Get X Coordinates from Column dropdown menu and select either X_Coordinate or Longitude.

    • Expand the Get Y Coordinates from Column dropdown menu and select either Y_Coordinate or Latitude.

    • Click on the Projection icon, then select the British Coordinate Systems option from the Category dropdown menu. Select the British National Grid [EPSG: 27700], or if you selected Longitude and Latitude in the steps above, select ETRS89 [EPSG: 4258].

    • Click OK to close that window and OK again to close the next window.

    • Finally, click Window > New Map Window to view the loaded geocoded points.

Loading CSV into QGIS

Note - These instructions are based on QGIS version 2.6.

  1. Launch QGIS and click Settings > Options.

  2. Select CRS from the left-hand menu and check that the Coordinate Reference System is set to British National Grid. Note - Check this is set for both Default CRS for new projects and the CRS for new layers sections. If these are not already set, click Select at the end of each section and type 27700 into the Filter Box to find and select British National Grid. Alternatively, if you intend to use Latitude and Longitude columns, select ETRS89 [EPSG: 4258].

  3. Click OK.

  4. Back in the QGIS UI, go to Layer and select Add Delimited Text Layer.

  5. Click Browse next to the filename and locate the CSV file that was created in Preparing the CSV data, containing the merged header files and AddressBase data.

  6. Select the CSV file and click Open.

  7. Accept the default or create a new layer name for the dataset.

  8. Ensure that the First record has field names box is ticked.

  9. For Field Options, select Decimal separator is comma.

  10. For Geometry Definition, select Point Coordinates.

  11. You should now be able to select the X_Coordinate field for the X Field dropdown and the Y_Coordinate field for the Y Field dropdown if this was not done automatically. Alternatively, if you wish to use the Latitude and Longitude columns, the Longitude column needs to be inserted into the X_COORDINATE field, and the Latitude column needs to be inserted into the Y_COORDINATE field.

  12. Click OK.

Loading CSV into a database

This section describes how to load AddressBase products into a few common databases.

Software dependencies:

ArcMap, ArcGIS Desktop and ArcGIS Server software do not support the BIGINT/NUMBER data type as an Object ID. Bear this in mind if the expectation is to use this data type directly with these ESRI products. An alternative method to facilitate using ESRI software is to store this data as a string and add a new Serial ID to act as the Object ID. If you are loading AddressBase data directly into a database, you may need to increase the column length to accommodate language characters such as '^'. Some databases treat this as an additional character and therefore, if you define the column length according to our specification, there is a chance that the load may fail. Please bear in mind such adjustments may be required depending on the database you use to load the data.

UPRN deletions:

If a UPRN is deleted and then reinserted, this does not compromise the integrity of the UPRN and its use as a primary key. If a delete is issued for a UPRN, this does not mean it will not reappear in subsequent supplies.

These are the reasons why this may happen:

  • The record has moved in location more than once, moving it out of your Area of Interest (AOI), hence the deletion, but then moved back into your AOI in the future. This would also occur if you altered your AOI.

  • A record has failed data validation upon a change being made. This can result, dependent on the change being made, in the record being deleted and then reintroduced when the error is fixed by the data supplier.

If a UPRN is deleted, it will not be reallocated to a different property and it therefore remains the unique identifier for a property.

Loading CSV into a PostGreSQL database

Note - These steps describe how to load AddressBase into a PostGreSQL database using the text files created by following the instructions in Preparing the CSV data to merge the CSV files.

Note - These instructions are based on PostGreSQL version 1.12.3 and assume that you have set-up your database with the PostGIS spatial extension. It is recommended that you have basic understanding of database terminology before following this guide.

  1. Prepare the text files as described in Preparing the CSV data.

  2. Check that there are no carriage returns (extra rows) at the end of the CSV output file as this will result in errors. To do this, open the CSV file and hit End on your keyboard. Your cursor should now be at the end of the last line, and not on any extra line below. If it is on the line below, hit Delete to remove the extra empty row.

  3. Open the PGAdmin tool (this can be found in Windows Start Menu > PostGreSQL).

  4. Either connect to an existing database or create a new database. It is recommended that the encoding is set to UTF-8.

  5. Open the public schema (although in a production environment, it is advised to use a different schema) and create the tables using the following steps:

  6. Once the table has been created, the data can be loaded into each table using the SQL COPY. Adding the CSV option as the first line contains a header record for each table. Please note that the examples below are for AddressBase, then AddressBase Plus and AddressBase Plus Islands, respectively. The path and filename may need to be changed to reflect your data set-up: COPY addressbase FROM 'C:/Address/AddressBase.csv' DELIMITER ', ' CSV HEADER; COPY addressbase_plus FROM 'C:/Address/AddressBase_Plus.csv' DELIMITER ', ' CSV HEADER; COPY addressbase_plus_islands FROM 'C:/Address/AddressBase_Plus_Islands.csv' DELIMITER ', ' CSV HEADER;

  1. Once loaded, you may want to add Primary Keys to the data. However, these can only be added on columns where the data values are unique. Where there are no unique data values, an index may be added which will aid searching. The UPRN provides the only unique value in AddressBase and AddressBase Plus. Primary Keys are added using the following steps:

  • Right-click on the table name and select Properties.

  • Select the Constraints tab.

  • Click the + to add a new primary key.

  • Enter a name to call the key under the general tab (for example, Key1).

  • Under the definition tab, select UPRN or any other unique value from the dropdown under columns.

  1. Click Save.

  1. You can also index the data by following these steps:

  • Right-click on the table name and select Create > Index.

  • Under general, enter a name (for example, Idx1).

  • Under the definition tab > Columns, click the +.

  • Select the UPRN for example, or any other unique value.

  • Click Save.

Converting coordinates to geometry

A PostGIS extension is required to create geometries. The AddressBase products contain both British National Grid (BNG) and ETRS89 coordinates. The SQL below shows how to create a column for BNG, but it can be altered to utilise the ETRS89 data.

  1. Add a geometry column called geom to make the data usable in a GIS: SELECT AddGeometryColumn ('public', 'addressbase_plus', 'geom', 27700, 'POINT', 2);

  2. Load the data into your new geometry column: UPDATE public.addressbase_plus SET geom = ST_GeomFromText('POINT(' || x_coordinate || ' ' || y_coordinate || ') ', 27700); This sets the geom column in the table to equal the values from the X_coordinate and Y_coordinate columns, with the spatial reference defined as 27700.

  3. Create a spatial index on the data using: CREATE INDEX idx_abp_geom ON public.addressbase_plus USING gist(geom); This adds the index name idx_abp_geom to the same table on the geom column.

Loading CSV into an Oracle database

Note - These instructions assume a basic knowledge of Oracle databases and SQLLDR (the package used to load the CSV files into the database). Other options are available for loading data into Oracle databases.

Using SQLLDR it is not necessary to merge all the AddressBase files into a single file, but it can load the data directly from the file provided as long as it has been unzipped first.

The following steps describe one method for loading a full supply of the data. Sections in italics denote where changes will need to be made to accommodate local file naming.

  1. Copy the data files from the disk to an appropriate location. It is worth noting that the files will need to be unzipped and therefore you will need in the region of 43Gb of free space.

  2. Once the data is copied, the next stage is to unzip the *.zip files to *.csv. This can be done using a package such as Winzip or 7Zip. Please see the data supply page for more information.

  3. With all the files unzipped, the latter stages are easier if you create a list of all the CSV files to be loaded. This can be done using a batch file that writes all the files out to a text file: dir *.csv /b/s >filelisting.txt pause This file will form the basis for loading the control file in a later step.

  4. Open the folder of your chosen product and you should see three files. Open the file ending createtable.sql in a text editor.

  5. Within the provided SQL there are references to <TablespaceName>, which need to be changed to the tablespace that is being worked in. When these are changed, copy and paste the SQL into Oracle to create the tables.

  6. Next, create a SQLLDR control file. An example of one of these files is Oracle AddressBase_Control.ctl, which is provided in the folder of the GitHub repository in Step 4 above. Open the SQLLDR control file for your chosen product in a text editor.

  7. Within the file you will see lines referencing INFILE. Populate these INFILE lines with the file listing created in Step 3, with one INFLE command for each file. This tells the process to open each of the files and carry out the other tasks listed below it. Note – The last section of the control file creates the Geometry for the X and Y coordinate (British National Grid) if you want to create a Geometry for the Latitude and Longitude values, this will need to be created separately.

  8. Once this file is created, it can be called from a .bat file to run it on the box that holds the database rather than a remote machine. If you wish to run it from a remote machine, contact your Oracle Administrator who will be able to advise on the best way to do this within your environment. The contents of the .bat file should be similar to the following: @sqlldr <username>/<password>@<service name> control= <name of ctl file created previously> Pause

  9. Once the load has completed the relevant indexes need to be built. The SQL statements to create the indexes can be found in the same GitHub repository linked in Step 4 above. As before, you can copy and paste the SQL statements from a text editor into Oracle to create the indexes. The example table name provided may be different to yours, so check if this needs to be changed before use.

Loading CSV into Microsoft SQL Server

Note - The following instructions assume that users have basic knowledge of Microsoft SQL Server and that the CSV data is already prepared as described in Preparing the CSV data.

Note – There are many ways to load AddressBase products into Microsoft SQL Server; this is just one suggested method for guidance.

  1. Open the SQL Server Management Studio (SSMS).

  2. Right-click on the database you are loading into and select Properties.

  3. Select Options on the left-hand side.

  4. Expand the dropdown box for Recovery Model and select Bulk-logged. This minimises the logfile size, otherwise the default logging for Microsoft SQL Server can cause logfiles to grow over 20Gb and this can cause issues with loading.

  5. Open the SQL Server Management Studio (SSMS) and right-click your database from the left-hand panel.

  6. Navigate to Tasks and click Import Data. This will open the SQL Server Import and Export Wizard.

  7. Click Next.

  8. On the next screen, change your Data Source to Flat File Source.

  9. Use the Browse button to navigate to your CSV file and select it. If you cannot see your files, ensure that the bottom right dropdown box has CSV files (*.csv) selected.

  10. Click Open.

  11. Your CSV file should have a header row already prepared in Preparing the CSV data. Ensure the Column names in the first data row is ticked.

  12. Check that the Text Qualifier is set to a double quote (“). This is to make sure that the quotations in the raw data supply are removed upon loading but that the data remains intact.

  13. On the left-hand side of this screen, select Columns and check that the Column delimiter is set to Comma.

  14. On the left-hand side of the screen, select Advanced.

  15. For each column of data you are loading, you will need to specify a DataType. The Microsoft SQL Server loader defaults each column to a String. The correct Data Type for each column is given in the technical specification:

  16. Once you have changed the Data Types for each column to match those given in the technical specification, click Next.

  17. Check that your table is going to be imported into the correct database and click Next.

  18. On this screen, you can edit the default table name that Microsoft SQL Server has chosen by clicking in the destination box. For example, for AddressBase Plus renaming to [dbo].[ADDRESSBASE_PLUS].

  19. Select Edit Mappings in the bottom right-hand corner.

  20. In the new window, you must remove the tick in the checkbox against the UPRN column, which needs to be the Primary Key of the table. Click OK once the Primary Key alterations have been completed.

  21. Click Next. On this screen, you can check that the Source column and Destination columns are correct.

  22. Click Next. A summary of your import will display. If you want to continue, click Finish.

  23. A report will be generated as your data is imported. Success should appear at the top once complete.

  24. You may need to right-click on your database and click Refresh to see your new table listed.

Setting Primary Keys

To create a Primary Key field, you can run an SQL statement, such as this example for AddressBase Plus below. Note - the columns you are creating these constraints on cannot be null or allowed to be null.

alter table dbo.ADDRESSBASE_PLUS add primary key ([UPRN]);

Creating the point geometry

You can also create point geometry using the X and Y coordinates or the Latitude and Longitude coordinate values. This is achieved by running the following SQL statement:

alter table dbo.ADDRESSBASE_PLUS
add geometry_column as geometry::Point([X_Coordinate],[Y_Coordinate], 27700); 

Note – This is using British National Grid coordinates, with 27700 representing the spatial reference of the data. To use the Latitude and Longitude coordinate, the spatial reference should be set to 4258 for ETRS89.

Last updated