1 of 3

Getting Started with CSV

What is a CSV file?

A comma-separated values (CSV) file is a common interchange format for spreadsheets and databases which facilitates a simplistic use of data. Each field is either textual or numeric. Within the CSV, each field is separated from the next by a comma. CSV file format is universally supported for easy ingestion into all major database products.

Please be aware that CSV files are designed to be opened in a database or GI system, and opening them in other software applications could corrupt the data. In particular, Excel has a row limit which might be exceeded by some of our CSV files containing OS NGD data, depending on the order you placed and its size. We recommend that you load CSV files containing OS NGD data directly into a database or GI system, rather than trying to open these files in Excel.

Change-Only Update (COU) files are only available for CSV data supplies of the OS NGD.

If you are using a large Area of Interest (AOI) or a full GB supply of data, for performance reasons we would suggest you load the CSV data into a database rather than trying to open directly in a GI system.

Key features of OS NGD CSV format

CSV offers users the following key features:

Change-Only Update (COU) files are only supplied in CSV format (they are not supplied in GeoPackage format)
Geometry provided as Well-Known Text (WKT)
Header rows included in the file
There will be one record per line in each file
Fields will be separated by commas
Where string fields contain commas, they will be delimited by double quotes
Double quotes inside strings will be escaped by doubling
Records will be terminated by Carriage Returns and Line Feeds
CSV files will be Unicode encoded in UTF-8

Loading CSV files

Loading OS NGD CSV files into databases

It is recommended that you have a basic understanding of database terminology before following the guides in the tabs below. The guides contain generic instructions, and it is recognised that there are multiple ways to load CSV files into databases which may be more suitable to your environment and existing processes.

Prior to loading the data into a database, it is necessary to create the relevant tables in the database. We have supplied the DDL statements that can be accessed in our OS NGD Resources GitHub repository.

These instructions are based on PostgreSQL version 14, but should work for all supported versions. The instructions assume that you have set-up your database with the PostGIS spatial extension.

Once connected to your PostgreSQL database, with the relevant schema and table created, the CSV file can be loaded with the following SQL statement using the COPY command:

COPY destination_schema.destination_table 
    FROM 'source_path_to_csv_file\source_name_of_csv_file.csv'
    DELIMITER ','
    CSV HEADER
    ENCODING 'UTF8';

PostGIS will automatically store the geometry data that is supplied in Well-Known Text (WKT) format.

There is a known bug affecting PostgreSQL versions 11, 12 and 13 in Windows environments, where the COPY command cannot load files larger than 4GB. As a workaround, version 14 (or later) of the COPY command can be used to load data into the affected database versions.

For reference, the error message states ERROR: could not stat file.

These instructions are based on Microsoft SQL Server 2019, but should work for all supported versions.

Once connected to your SQL Server database, with the relevant schema and table created, the CSV file can be loaded with the following SQL statement using the BULK INSERT command:

BULK INSERT destination_schema.destination_table
FROM 'source_path_to_csv_file\source_name_of_csv_file.csv'
WITH (
    FORMAT = 'CSV',
    FIRSTROW = 2
);

It is not possible to BULK INSERT the geometries directly in their Well-Known Text (WKT) format.

However, it is possible to change the destination geometry column to a nvarchar(max) type, and then either post process the table or use a a computed column to generate a geometry type column (see code examples below).

Post process example

-- Add new geometry column
ALTER TABLE destination_schema.destination_table
ADD new_geometry_column_name geometry;

-- Set new geometry column using existing WKT column
UPDATE destination_schema.destination_table
SET [new_geometry_column_name] =  geometry::STGeomFromText(destination_table.geometry, /* EPSG_CODE_OF_GEOM */); 

-- Optional drop original WKT column
ALTER TABLE destination_schema.destination_table
DROP COLUMN geometry;

Computed column example

ALTER TABLE destination_schema.destination_table
ADD new_computed_column_name AS
    geometry::STGeomFromText(destination_table.geometry, /* EPSG_CODE_OF_GEOM */) persisted;

It is not possible to load OS NGD CSV files into an Oracle database using the default SQL*Loader utility. The geometries are supplied in Well-Known Text (WKT) format and some of them are too large for SQL*Loader to process.

However, with the relevant schema and table created in your Oracle database, the CSV file can be loaded using ETL (extract, transform, load) tools, for example, GDAL or FME.

Processing COUs

Change-Only Update (COU) files are only available for CSV data supplies of the OS NGD. Further information about COU data supplies can be found on the page.

The process to load the COU files will depend on a number of factors based on your own environment and requirements, including:

Database software
Data loading method or software
Feature retention and lifecycle requirements

Feature retention options

There are two main scenarios that represent either extreme of the feature lifecycle options:

Latest feature view
Full feature archive

Latest feature view

In this scenario, the COU files are processed and only the current version of every feature is retained. This will result in the same data holdings that would be achieved if a new full supply was received on this date.

To achieve this, either post-processing will have to be done once all COU data is loaded into a database, or the loading process will need to evaluate the COU data against your existing data and process accordingly. This processing will be based on the supplied attributes which detail the type of change and the dates for that version.

Change type

Each feature type is supplied with a Change Type attribute (changetype), which is populated from a code list value (changetypevalue).

Below is a table of the possible change types and the resultant actions required to maintain a latest feature view of the data for the specific feature type. Individual features should be targeted based on that feature type's unique identifier. In the majority of feature types, this will be the osid, but this is not always the case; therefore, please check the appropriate feature type documentation.

Version dates

A single feature may be updated multiple times in a single COU file when multiple changes occur within the selected COU frequency. Instead of suppressing all changes other than the last edit, the COU will contain all of the edits which have been made to that feature.

To obtain the current 'live' view of a feature, the latest edit is required; all previous edits can be discounted. In essence, this requires the following two processes:

Discarding all feature records with the exception of the latest Version Available From Date (versionavailablefromdate)
Discarding all feature records that have a Version Available To Date (versionavailabletodate) populated, i.e. it is NOT NULL. These records will also have the Change Type Value (changetypevalue) of 'End Of Life' or 'Moved To A Different Feature Type'.

Full feature archive

In this scenario, every record is retained, giving a full lifecycle since the supply started of each feature. This will result in multiple records for each feature.

To enable this in a database, the default primary keys (as defined in the provided DDL scripts) will have to be changed to also include the version available from and to dates (versionavailablefromdate, versionavailabletodate).