Data formats

The AddressBase product will be distributed as a comma-separated values (CSV) file or Geography Markup Language (GML) version 3.2. Both of these formats can either be supplied as a full supply or a change-only update (COU) supply.

CSV

The CSV supply of AddressBase means:

  • There will be one record per line in each file.

  • Fields will be separated by commas.

  • String fields will be delimited by double quotes.

  • No comma will be placed at the end of each row in the file.

  • Records will be terminated by Carriage Return / Line Feed.

  • Double quotes inside strings will be escaped by doubling.

Where a field has no value in a record, two commas will be placed together in the record (one for the end of the previous field and one for the end of the null field). Where the null field is a text field double quotes will be included between the two commas, for example:

, “”,

AddressBase CSV data will be transferred using Unicode encoded in UTF-8. Unicode includes all the characters in ISO-8859-14 (Welsh characters). Some accented characters are encoded differently.

The transfer will normally be in a single file, but the data can be split into multiple files using volume numbers. Most files will only be split where there are more than one million records.

The header row for the CSV is supplied separately and can be downloaded from the product support pages.

GML

The GML Encoding standard is an Extensible Markup Language (XML) grammar for expressing geographical features. XML schemas are used to define and validate the format and content of GML. The XML specifications that GML is based on are available from the World Wide Web Consortium (W3C) website: http://www.w3.org. More information can be found in the Open Geospatial Consortium (OGC) PDF document, Geography Markup Language v3.2.1: https://portal.ogc.org/files/?artifact_id=20509. The GML 3.2.1 specification provides a set of schemas that define the GML feature constructs and geometric types. These are designed to be used as a basis for building application-specific schemas, which define the data content.

A GML document is described using a GML Schema. The AddressBase schema document (addressbase.xsd), defines the features in AddressBase GML.

It imports the GML 3.2.1 schemas which rely on XML as defined by W3C at: http://www.w3.org/XML/1998/namespace.html.

The application schema uses the following XML namespaces, for which definitions are available as given here:

Information about Unicode and UTF-8, the character encoding we have chosen, is available on the Unicode Consortium website: http://www.unicode.org/.

Features

Each feature within the AddressBaseSupplySet:FeatureCollection is encapsulated in the following member element according to its feature type:

Member Element
Feature Type

<abpl:addressMember>

Address

The UPRN of the feature is provided in the XML attribute of the gml:id

<abpl:addressMember>
<abpl:Address gml:id=”uk.geoplace.uprn.1000011535314”>
………………..
</abpl:Addrress>
</abpl:addressMember

See Example records > GML for specific GML examples.

Envelope

In the GML supply you can determine the extent of your supply by the <gml: Envelope>. For example:

<gml:boundedBy>
<gml:Envelope srsName=”urn:ogc:def:crs:EPSG::27700”>
<gml:lowerCorner>82643.6 5333.6</gml:lowerCorner>
<gml:upperCorner>655989 657599.5</gml:upperCorner>
</gml:Envelope>
</gml:boundedBy>

Last updated