Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This getting started guide provides instructions for using AddressBase in different software applications. Users with limited technical knowledge will be able to follow this guide.
These instructions show you how to get started with AddressBase, AddressBase Plus and AddressBase Plus Islands.
AddressBase products are created by bringing together different address sources:
Local Authority Gazetteers across Great Britain, Northern Ireland, the Channel Islands and the Isle of Man
Royal Mail PAF data
References to Valuation Office Agency (VOA) data
Additional addresses and coordinates from Ordnance Survey
The data is supplied as comma-separated values (CSV) or Geography Markup Language (GML).
This getting started guide shows you how to obtain a data supply, load and work with AddressBase data. It includes the following sections:
AddressBase provides a current view of all Royal Mail PAF addresses that have been matched to the NLPG and OSG . The product provides Royal Mail attribution as well as enhancing PAF with X and Y coordinates on the British National Grid and ETRS89 coordinate reference system and providing the classification of an address to a primary level. It also provides a primary level classification.
This product will provide you with a single view of an address, allow you to locate this address on a map to give you a geographic view and carry out primary analysis on the function of the address to determine, for example residential from commercial properties.
This product is updated every six weeks
Identify a property and locate it on a map with precision – using the X and Y coordinates we’ve assigned to the Royal Mail Postcode Address File (PAF)® data.
Every address geo data record provides Royal Mail address information from PAF, the Unique Delivery Point Reference Number (UDPRN) and X and Y coordinates.
Using the basic classifications in AddressBase you can quickly filter between residential and commercial addresses, which is ideal for marketing purposes.
Our basic classifications – residential or commercial – will reduce the costs and increase the effectiveness of your direct marketing.
In customer services, the need for accuracy is paramount. Have confidence in your front-line staff’s ability to look up addresses on a database of millions, quickly and efficiently
Access: Download
Data theme: Address
Data structure: Vector - Points
Coverage: Great Britain
Scale: 1:1 250 to 1: 10 000
Format: CSV, GML 3.2.1
Ordering area: All of Great Britain or customisable area (5km² tiles or user-defined polygon)
OS Data Hub plan: Public Sector Plan, Premium Plan, Energy & Infrastructure Plan
To find out details such as houses that have been converted to flats, you'll need AddressBase Plus. it includes current properties and addresses sourced from local authorities, Ordnance Survey and Royal Mail. These are all matched to the UPRN and structured in a flat-file model.
AddressBase Plus has more records than AddressBase as it includes objects without postal addresses such as places of worship and community centres – as well as sub-divided properties. It lets you locate an address or property on a map, through the assigned X and Y coordinates.
Crucially, the cross-referencing information with OS MasterMap products via Topographic Identifiers (TOIDs) means you can view address data within a wider context.
Royal Mail royalties are included in the licence fee. A separate Royal Mail royalty fee applies if you license the AddressBase data on External Transaction Solution (ETS) terms.
If the file size of your order is smaller than 2Gb, you can get it from our FTP server. in addition, public sector customers can download 5km chunk orders via our download service.
The database is a vital component of the single address gazetteer database and is in each of the AddressBase products where there has been a match confirming the address to the LLPG address.
AddressBase data is an addressing gazetteer that can be used within GIS and database systems. For details of Ordnance Survey’s licensed partners, who can incorporate the AddressBase products in their systems, please see the on the Ordnance Survey website.
Ordnance Survey does not recommend either suppliers or software products as the most appropriate system depends on many factors, such as the amount of data being taken, resources available within the organisation, the existing and planned information technology infrastructure and the applications that AddressBase products can be used for.
However, as a minimum, the following elements will be required in any system:
A means of reading the data, either in its native format, or by translating it into a file format or for storage in a database.
A means of storing and distributing the data, perhaps in a database or through a web-based service.
A way of visualising and querying the data, typically a GIS.
You are advised to copy the supplied data to a backup medium.
For reading purposes, it is recommended to store the data on a single hard disc. This will speed up the ability of your computer to read the data. Unzipped file sizes for the full supply of each product are as follows:
Product | Unzipped CSV file size | Unzipped GML file size |
---|
GML is an XML dialect which can be used to model geographic features. It was designed by the Open Geospatial Consortium (OGC) as a means for people to share information regardless of the applications or technology that they use.
In the first instance, GML was used to overcome the differences between different GIS applications by providing a neutral file format as an alternative to proprietary formats. Because it is independent of applications, it can also be moved between databases or other types of application, which allows a wider application than just GIS data transfer.
GML data can be viewed and loaded into a database using software such as Safe FME:
These instructions describe how to prepare the CSV format of AddressBase, AddressBase Plus and AddressBase Plus Islands data for processing.
These instructions describe how to load the CSV format of AddressBase, AddressBase Plus and AddressBase Plus Islands data. In these examples, AddressBase Plus data will be used to describe the procedures in various GI systems.
It is assumed that the preparation of the AddressBase, AddressBase Plus or AddressBase Plus Islands CSV data has been carried out as instructed in before attempting to load the data. If it has not been done, the full set of data will not load, and data loaded will not contain header information.
This section describes how to load AddressBase products into a few common databases.
ArcMap, ArcGIS Desktop and ArcGIS Server software do not support the BIGINT/NUMBER data type as an Object ID. Bear this in mind if the expectation is to use this data type directly with these ESRI products. An alternative method to facilitate using ESRI software is to store this data as a string and add a new Serial ID to act as the Object ID. If you are loading AddressBase data directly into a database, you may need to increase the column length to accommodate language characters such as '^'. Some databases treat this as an additional character and therefore, if you define the column length according to our specification, there is a chance that the load may fail. Please bear in mind such adjustments may be required depending on the database you use to load the data.
If a UPRN is deleted and then reinserted, this does not compromise the integrity of the UPRN and its use as a primary key. If a delete is issued for a UPRN, this does not mean it will not reappear in subsequent supplies.
These are the reasons why this may happen:
The record has moved in location more than once, moving it out of your Area of Interest (AOI), hence the deletion, but then moved back into your AOI in the future. This would also occur if you altered your AOI.
A record has failed data validation upon a change being made. This can result, dependent on the change being made, in the record being deleted and then reintroduced when the error is fixed by the data supplier.
If a UPRN is deleted, it will not be reallocated to a different property and it therefore remains the unique identifier for a property.
The AddressBase products provide a variety of data fields, allowing you to construct different forms of an address for a given addressable object, dependent on how the address is to be used.
AddressBase contains the Delivery Point Address which is sourced from Royal Mail’s Postcode Address File (PAF) – a non-geocoded list of addresses. These addresses are used primarily as a ‘mailing list’ for postal purposes.
There are two types of address contained in the AddressBase Plus products:
Delivery Point Address
Geographic Address
These two address types come from different sources and are matched together by GeoPlace.
As noted above, the Delivery Point Address is sourced from Royal Mail’s PAF data. Geographic Addresses are maintained by contributing Local Authorities. The structure of a Geographic Address is based on the British Standard BS7666. These addresses are used to provide an accurate geographic locator for an object to support, for example, service delivery, asset management, or command and control operations. They also represent the legal form of addresses as created under street naming and numbering legislation.
Each UPRN in AddressBase Plus provides the Geographic Address and, where matched, the Delivery Point Address in a one-to-one relationship. If there is no match, then the following fields will be left empty:
DEPARTMENT_NAME |
---|
A common requirement for customers using the AddressBase products is to build a single address label from core address elements.
There are two types of address label. The simplest is a full address on a single line with different elements separated by commas and spaces. This type of label is suited for displaying a full address within a tabular display, such as within an on-screen data grid or spreadsheet, or where a single-line printed address is most appropriate (such as within the text, header or footer of a letter):
ROSE COTTAGE, 5 MAIN STREET, ADDRESSVILLE, LONDON, SE99 9EX
The other type of formatted address is a multi-line address label. These are most often used on envelopes or at the tops of letters, where different parts of an address are separated onto different lines:
The rules in this guide are suggestions only and can be used for visual display of full addresses. It is strongly recommended that address components are stored in the format in which they are provided in order to allow maximum flexibility of use and derived value.
A Delivery Point Address contains information sourced from Royal Mail (PAF). Stringent rules are used to match these addresses to the Geographic Address and assign a common UPRN to link addresses from the two addressing sources together in the data model.
To construct a single address label based purely on the Royal Mail PAF address fields, the following attributes can be used to build a Delivery Point Address label.
The table below provides details of the Delivery Point Address Components.
These address components are listed in the correct order in which they should appear on an address label. There may be a business need to replace the thoroughfare, locality and post_town attributes with the Welsh equivalent. The following examples use the English version of these attributes only.
It should be noted that most of the PAF fields are optional and may contain null values (or zero, in the cases of ‘BUILDING NUMBER’ and ‘PO BOX NUMBER’). In these cases, those fields should be omitted.
The following (entirely fictional) example shows all of the PAF fields filled in (apart from the PO Box number) and indicates how they should be ordered in a single address label.
In cases where a PO BOX NUMBER is present, it will only be described in the data as an integer. In order to properly format these addresses when generating an address label, these integers should be prefixed with the text ‘PO BOX’, as shown in the following example:
Where null or empty, string values exist (for character fields) or zeros or nulls (for integer fields), those fields should be entirely omitted from the output. However, the order in which the fields should be concatenated always remains the same.
Building a single-line, formatted address for a Delivery Point is relatively straightforward. All the fields should be checked in the order shown previously in Table 1, and those that have values should be concatenated together into a single line. Generally, address components should be separated by a comma followed by a single space (‘, ’), although sometimes only a space is used between a building number and a thoroughfare name. You can use your preference.
The SQL operator for concatenating text is a double pipe (‘||’).
CASE blocks have been used to test each of the fields for null values before concatenating its contents (along with a suitable separator – either ‘, ’ or ‘ ’).
The field names and table names used are illustrative and may vary between databases.
Depending on the database schema and data loading method used, it may be necessary to test some fields for empty strings (‘’) or zero values (for integer fields) instead of, or as well as, testing for NULLs.
If you are using PostGres (PostGIS), it might be beneficial to substitute the ‘IS NOT NULL’ with != ‘’. This should improve the overall appearance of the output.
Splitting a Delivery Point Address into multiple lines is more complicated. There are several rules to consider in order to avoid having very short lines (for example, just a building number) or very long lines within the formatted address. A summary of these rules is as follows:
Generally, if there is a building number, it should appear on the same line as the thoroughfare (or dependent thoroughfare) name. If there is no thoroughfare name information, it should appear on the same line as the first locality name.
In cases where building numbers have been placed in the building name field due to the presence of a letter suffix (for example, ‘11A’) or a number range separator (for example, ‘3–5’), these should be detected and placed on the same line as the thoroughfare name in the same way as a building number (or on the first locality line if no thoroughfare name is present).
In most other cases, the building name, if present, should appear on a separate line above the thoroughfare (or dependent thoroughfare) name. If there is no thoroughfare name present, it should appear on the same line as the first locality name.
Similar tests should be applied to the SUB_BUILDING_NAME field: if this field contains a number, a number with a suffix, or a numeric range, it should precede the building name on the same line. In most other cases, it should appear on a separate line above the building name.
The structure of a Geographic Address is based on the British Standard BS7666 and is split into several components. This means that in order to construct a complete address label (for example, on an envelope, database form or GIS display), the components need to be constructed according to a set of rules.
Within the AddressBase Plus products, the core property level address information is stored within the Primary Addressable Object (PAO) and Secondary Addressable Object (SAO) fields. The additional attribution required to build a full address label are the la_organisation
, street_description
, locality
, town_name
, administrative_area
and postcode_locator
.
For a full description of PAOs and SAOs, and the complete set of AddressBase Plus fields, please refer to the Technical Specification on your respective product:
To construct a single address label based purely on the BS7666 address fields, the following attributes should be used to build a Geographic Address label.
*ADMINISTRATIVE_AREA is optional because it is common for this field to be the same as the TOWN_NAME. Sometimes, however, this field will help users construct a more complete address.
These address components are listed in the correct order in which they should appear on an address label. There may be a business need to use alternate language fields for the SAO_TEXT, PAO_TEXT and STREET_DESCRIPTION, which are also listed in the correct order above.
When building a single address label, it may be necessary to concatenate the various SAO fields and PAO fields together respectively. These fields contain any property names, numbers, number ranges or suffixes that apply to an address.
A PAO number/range string should be constructed from the PAO_START_NUMBER, PAO_START_SUFFIX, PAO_ END_NUMBER and PAO_END_SUFFIX fields, as illustrated in the following table.
Similarly, a SAO number/range string should be constructed from the SAO_START_NUMBER, SAO_START_SUFFIX, SAO_END_NUMBER and SAO_END_SUFFIX fields, as illustrated in the following table.
In addition to the numeric range fields described above, there are also PAO_text and SAO_text fields. These fields may be populated instead of, or as well as, the numeric range fields. In both cases, if both text and a numeric range string are present, the text should appear before the numeric range in any formatted address.
For PAOs, there will always be either a text entry or a numeric/range entry, or both. This is not the case for SAOs, which may be entirely absent for a given address.
The street description and administrative area names are always present, while the locality name and town name may be empty.
The ADMINISTRATIVE_AREA field always contains a value; however, this value will not always enhance an address, but in some cases it will. In particular, check that it is not the same as the value in the TOWN_NAME field, as this is often the case.
In other cases, the administrative area name will simply contain the local authority name, which would not traditionally form part of a single or multi-line address but can be included to add additional information to an address label. Its inclusion is largely down to business requirements or personal preference; however, it may also be useful to 'de-duplicate' some Geographic Addresses.
The following (entirely fictional) example shows all of the BS7666 Geographic Address fields filled in and how they should be ordered in a single address label.
*The number/range strings are built from the relevant PAO/SAO START_NUMBER, START_SUFFIX, END_NUMBER and END_SUFFIX fields, as described above, and formatted as character strings.
Where an administrative area matches the town name, it should always be omitted.
Where null or empty string values exist (for character fields) or zeros or nulls (for integer fields), those fields should be entirely omitted from the output; however, the order in which the fields should be concatenated always remains the same.
The SQL operator for concatenating text is a double pipe (‘||’).
CASE blocks have been used to test each of the fields for null values before concatenating its contents (along with a suitable separator – either ‘, ’ or ‘ ’).
The field names and table names used are illustrative and may vary between databases.
Depending on the database schema and data loading method used, it may be necessary to test some fields for empty strings (‘’) or zero values (for integer fields) instead of or as well as testing for NULLs.
Splitting a Geographic Address into multiple lines is more complex. As with Delivery Point Addresses, there are several rules to consider in order to avoid having very short lines (for example, just a building number) or very long lines within the formatted address.
A summary of these rules is as follows:
Generally, if there is a PAO number/range string, it should appear on the same line as the Street Description. For example: 11A MAIN STREET
If there is a PAO_text value, it should always appear on the line above the Street Name (or on the line above the <PAO number string> + <Street Name> where there is a PAO number/range).
If there is a SAO_text value, it should appear on a separate line above the PAO_text line (or the PAO number/range + Street Name where there is no PAO_text value).
If there is a SAO number/range value, it should be inserted either on the same line as the PAO_text (if there is a PAO_text value), or on the same line as the PAO number/range + Street Name (if there is only a PAO number/range value and no PAO_text value). If there are both PAO_text and a PAO number/range, then the SAO number/range should appear on the same line as the PAO_text, and the PAO number/range should appear on the street line.
If there is a SAO_text value, it should always appear on its own line.
If there is an Organisation Name, it should always appear alone as the top line of the address.
The Locality (if present) should appear on a separate line beneath the Street Description, followed by the Town Name on the line below it. If there is no Locality, the Town Name should appear alone on the line beneath the Street Description.
If the Administrative Area name is required and it is not a duplicate of the Town Name, it can optionally be included on a separate line beneath the Town Name.
Finally, the Postcode Locator should be inserted on the final line of the address.
Given that AddressBase Plus contains two different types of address, a decision needs to be made as to whether to use the Geographic or Delivery Point Addresses, or a mixture.
The following two options should be considered:
Use Delivery Point Addresses whenever they are available, and when they are not, use a Geographic Address.
Use Geographic Addresses in all cases.
Depending on business requirements, in some user interfaces it may be worth considering displaying both forms of an address where possible, since this will provide the maximum information available about a given UPRN.
‘Mixing and matching’ components from the two different forms of address into a single address label is not recommended as this is likely to cause confusion in some instances.
AddressBase Plus offers other attributes that could be used in conjunction with address labels. For example, classification can be used to target certain types of property, or OS MasterMap Topography TOID cross references can be used to link address labels to Topographic objects and viewed in a GIS.
TOID cross references are not available in AddressBase Plus Islands.
All the AddressBase products are available as a full supply or a COU. A COU means you will only be supplied with the features which have changed since your last supply. The following sections provide guidance on how you could potentially manage a COU supply of AddressBase and AddressBase Plus data.
If you receive a tile supply, you will receive Change Chunks. This means if a record within your tile has changed, all of the records in that tile will be provided to you as inserts, and no updates or deletes will be issued.
Tiles are only available for GB supplies, so this does not apply to AddressBase Plus Islands.
At a high-level, there are three types of change found within a COU:
Deletes (CHANGE_TYPE ‘D’) are objects that have ceased to exist in your AOI since the last product refresh.
Inserts (CHANGE_TYPE ‘I’) are objects that have been newly inserted into your AOI since the last product refresh.
Updates (CHANGE_TYPE ‘U’) are objects that have been updated in your AOI since the last product refresh.
The diagram below shows how to implement an AddressBase, AddressBase Plus and AddressBase Plus Islands COU within a database.
Before a COU is applied, there may be a business requirement to archive existing address records. The table below shows how to implement archiving with an AddressBase COU within a database.
Within AddressBase and AddressBase Plus there will be no records with the same UPRN. This can be tested by checking the number of records that have the same UPRN. The following SQL code would notify you of any duplicates:
This query should return 0 rows, and this confirms that there are no duplicates. As there are no duplicate records, we can use the UPRN to apply the COU.
Once confirmed, the following steps can be taken to apply the COU (without archiving):
Initially delete the existing records that will be updated and deleted:
Insert the new updated records and the new inserted records:
Where there is a business requirement to keep the records that are being Updated and Deleted in a separate archive table, the following SQL will create an Archive Table. It will populate with records that are being Updated and Deleted from the live AddressBase or AddressBase Plus table.
The following command creates an archive table of the records that are being updated and deleted from the existing table.
If this table already exists, you can simply use INSERT INTO rather than CREATE TABLE.
The following command then deletes the records from the existing table, which are either updates or deletions:
The following command then inserts the new insert records and the new updated records into the live table:
Please see for instructions on how to load and work with AddressBase data. This is a composite guide for , and .
Access to this product is free for PSGA members. Find out if you are a PSGA member or download a sample of AddressBase data by with links to all of the relevant resources. Alternatively, you can try out the full product by applying for .
AddressBase, AddressBase Plus and AddressBase Plus Islands are also available from Ordnance Survey as a supply in GML format. Loading GML into most GIS applications requires the use of third-party translation software, which is not covered in this guide. If more information is required in the loading of GML format, please
Note - When using CSV data in ArcGIS Pro, it is necessary to have column headings. Please ensure that headings have already been prepared as instructed .
Navigate to the location with the merged AddressBase or AddressBase Plus CSV file with the appended headers that you created in . Select the file and click OK.
Change the Coordinate System to British_National_Grid by clicking the globe icon .
Note - When using CSV data in ArcGIS , it is necessary to have column headings. Please ensure that headings have already been prepared as instructed .
Note - When using CSV data in MapInfo, it is not a critical requirement to have column headings. However, for ease of use we recommend using the headings supplied by Ordnance Survey. Instructions on how to merge the data and append the header files can be found in .
Click Browse next to the filename and locate the CSV file that was created in , containing the merged header files and AddressBase data.
Note - These steps describe how to load AddressBase into a PostGreSQL database using the text files created by following the instructions in to merge the CSV files.
Prepare the text files as described in .
Depending on the data to be loaded, download the SQL file from either the AddressBase or AddressBase_Plus_and_Island folder on:
Click the edit icon .
Once the data is copied, the next stage is to unzip the *.zip
files to *.csv
. This can be done using a package such as Winzip or 7Zip. Please see the for more information.
Go to the OS GitHub repository:
Note - The following instructions assume that users have basic knowledge of Microsoft SQL Server and that the CSV data is already prepared as described in .
Your CSV file should have a header row already prepared in . Ensure the Column names in the first data row is ticked.
This guide outlines a methodology for structuring and layering a single address label, providing suggested logic to build both the Delivery Point and Geographic Address. The logic in is applicable to AddressBase, AddressBase Plus and AddressBase Plus Islands. The logic in is only applicable to AddressBase Plus and AddressBase Plus Islands.
Delivery Point Address Component | Type |
---|
Delivery Point Address Component | Example |
---|
Delivery Point Address Component | Data Content | Formatted output |
---|
Delivery Point Address Component | Data content | Formatted output |
---|
An example of SQL logic to create a single-line Delivery Point Address is on our GitHub repository which should be used under the following considerations:
Geographic Address Component | Type |
---|
Attribute | Example 1 | Example 2 | Example 3 | Example 4 |
---|
Attribute | Example 1 | Example 2 | Example 3 | Example 4 |
---|
Administrative area not included | vs | Administrative area included (BURY) |
---|
Geographic Address Component | Example |
---|
Delivery Point Address Component | Data content | Formatted output |
---|
Delivery Point Address Component | Data content | Formatted output |
---|
Building a single-line, formatted address for a Geographic Address is slightly more complicated than for a Delivery Point Address due to the need to preformat the SAO and PAO number/range strings. However, once this is done, the process is largely the same as before: the calculated fields should be checked in the order shown previously in , and those that have values should be concatenated together into a single line. Generally, address components should be separated by a comma followed by a single space (‘, ’), although sometimes only a space is used between a PAO number/range string and a street description. This is down to personal preference.
Example SQL logic to create a single-line Geographic Address can be found , which should be used under the following considerations:
ORGANISATION_NAME |
SUB_BUILDING_NAME |
BUILDING_NAME |
BUILDING_NUMBER |
PO_BOX_NUMBER |
DEPENDENT_THOROUGHFARE (and WELSH_DEPENDENT_THOROUGHFARE) |
THOROUGHFARE (and WELSH_THOROUGHFARE) |
DOUBLE_DEPENDENT_LOCALITY (and WELSH_DOUBLE_DEPENDENT_LOCALITY) |
DEPENDENT_LOCALITY (and WELSH_DEPENDENT_LOCALITY) |
POST_TOWN (and WELSH_POST_TOWN) |
POSTCODE |
ROSE COTTAGE 5 MAIN STREET ADDRESSVILLE LONDON SE99 9EX |
DEPARTMENT_NAME | Character |
ORGANISATION_NAME | Character |
SUB_BUILDING_NAME | Character |
BUILDING_NAME | Character |
BUILDING_NUMBER | Integer |
PO_BOX_NUMBER | Integer |
DEPENDENT_THOROUGHFARE (or WELSH_DEPENDENT_THOROUGHFARE) | Character |
THOROUGHFARE (or WELSH_THOROUGHFARE) | Character |
DOUBLE_DEPENDENT_LOCALITY (or WELSH_DOUBLE_DEPENDENT_LOCALITY) | Character |
DEPENDENT_LOCALITY (or WELSH_DEPENDENT_LOCALITY) | Character |
POST_TOWN (or WELSH_POST_TOWN) | Character |
POSTCODE | Character |
DEPARTMENT_NAME | CUSTOMER SERVICE DEPARTMENT |
ORGANISATION_NAME | JW SIMPSON LTD. |
SUB_BUILDING_NAME | UNIT 3 |
BUILDING_NAME | THE OLD FORGE 7 |
BUILDING_NUMBER | 7 |
PO_BOX_NUMBER |
DEPENDENT_THOROUGHFARE | RICHMOND TERRACE |
THOROUGHFARE | MAIN STREET |
DOUBLE_DEPENDENT_LOCALITY | HOOK |
DEPENDENT_LOCALITY | WARSASH |
POST_TOWN | SOUTHAMPTON |
POSTCODE | SO99 9ZZ |
ORGANISATION_NAME | ‘JWS CONSULTING’ | JWS CONSULTING |
PO_BOX_NUMBER | 5422 | PO BOX 5422 |
THOROUGHFARE | ‘HIGH STREET’ | HIGH STREET |
POST_TOWN | 'SPRINGFIELD’ | SPRINGFIELD |
POSTCODE | ‘SP77 0SF’ | SP77 0SF |
DEPARTMENT_NAME | Null |
ORGANISATION_NAME | ‘TM MOTORS’ | TM MOTORS |
SUB_BUILDING_NAME | Null |
BUILDING_NAME | ‘THE OLD BARN’ | THE OLD BARN |
BUILDING_NUMBER | 0 (or null) |
PO_BOX_NUMBER | 0 (or null) |
DEPENDENT_THOROUGHFARE | Null |
THOROUGHFARE | ‘HORSHAM LANE’ | HORSHAM LANE |
DOUBLE_DEPENDENT_LOCALITY | Null |
DEPENDENT_LOCALITY | Null |
POST_TOWN | ‘HORSHAM’ | HORSHAM |
POSTCODE | ‘RH12 1EQ’ | RH12 1EQ |
LA_ORGANISATION | Character |
SAO_TEXT (or ALT_LANGUAGE_SAO_TEXT) | Character |
SAO_START_NUMBER | Integer |
SAO_START_SUFFIX | Character |
SAO_END_NUMBER | Integer |
SAO_END_SUFFIX | Character |
PAO_TEXT (or ALT_LANGUAGE_PAO_TEXT) | Character |
PAO_START_NUMBER | Integer |
PAO_START_SUFFIX | Character |
PAO_END_NUMBER | Integer |
PAO_END_SUFFIX | Character |
STREET_DESCRIPTION (or ALT_LANGUAGE_STREET_DESCRIPTION) | Character |
LOCALITY | Character |
TOWN_NAME | Character |
ADMINISTRATIVE_AREA* | Character |
POSTCODE_LOCATOR | Character |
PAO_START_NUMBER PAO_START_SUFFIX PAO_END_NUMBER PAO_END_SUFFIX | 1 | 1 A | 1 5 | 1 A 5 C |
Rendered PAO range | 1 | 1A | 1-5 | 1A-5C |
PAO (number string) PAO (text) | 1 | 1A | 1A Rose Cottage | Rose Cottage |
Rendered PAO (showing street name location) | 1 <street> | 1A <street> | Rose Cottage, 1A <street> | Rose Cottage, <street> |
34, CROW LANE, RAMSBOTTOM, BL0 9BR | 34, CROW LANE, RAMSBOTTOM, BURY, BL0 9BR |
LA_ORGANISATION SAO_TEXT SAO (number/range string)* PAO_TEXT PAO (number/range string)* STREET_DESCRIPTION LOCALITY TOWN_NAME ADMINISTRATIVE_AREA POSTCODE_LOCATOR | JW SIMPSON LTD THE ANNEXE 1A THE OLD MILL 7–9 MAIN STREET HOOK WARSASH SOUTHAMPTON SO99 9ZZ |
PAO_TEXT | ‘HIGHBURY HOUSE’ | HIGHBURY HOUSE |
STREET_DESCRIPTION | ‘HIGH STREET’ | HIGH STREET |
TOWN_NAME | ‘SOUTHAMPTON’ | SOUTHAMPTON |
ADMINISTRATIVE_AREA | ‘SOUTHAMPTON’ |
POSTCODE_LOCATOR | ‘SO77 0SF’ | SO77 0SF |
ORGANISATION | ‘TM MOTORS’ | TM MOTORS |
SAO_TEXT | null |
SAO (number/range string)* | null |
PAO_TEXT | ‘THE OLD BARN’ | THE OLD BARN |
PAO (number/range string)* | ‘1’ | 1 |
STREET_DESCRIPTION | ‘HORSHAM LANE’ | HORSHAM LANE |
LOCALITY | null |
TOWN_NAME | ‘HORSHAM’ | HORSHAM |
ADMINISTRATIVE_AREA | ‘HORSHAM’ | * Duplicate name omitted |
POSTCODE_LOCATOR | ‘RH12 1EQ’ | RH12 1EQ |
PAO_text only | PAO_text and PAO number or range |
ROSE COTTAGE, MAIN STREET | ROSE COTTAGE, 11A MAIN STREET |
SAO_text value only, with PAO_text value only | SAO_text value only, with PAO number/range only |
THE ANNEXE, ROSE COURT, MAIN STREET | THE ANNEXE, 11A MAIN STREET |
SAO number/range value only, and PAO_text value only | SAO number/range value only, and PAO number/range value only | SAO number/range value only, and both PAO_text and PAO number/range values |
1A ROSE COURT, MAIN STREET | 1-3, 11A MAIN STREET | 1A ROSE COURT, 11A MAIN STREET |
SAO_text value only with PAO_text only | SAO_text and SAO number/range and PAO_text and PAO number/range |
THE ANNEXE, ROSE COTTAGE, MAIN STREET | WARDEN’S FLAT, 1A ROSE COURT, 11A MAIN STREET |
Organisation Name along with all PAO + SAO fields |
COTTAGE INDUSTRY LTD, THE ANNEXE, 1A ROSE COURT, 11A MAIN STREET |
Locality and Town Name present | Town Name only |
[first part of address, formatted as described above] MAIN STREET, HIGHFIELD, SOUTHAMPTON | [first part of address, formatted as described above] HIGH STREET, SOUTHAMPTON |
Administrative Area name included |
[first part of address, formatted as described above] MAIN STREET, WINDSOR, ROYAL BOROUGH OF WINDSOR AND MAIDENHEAD |
With Postcode_Locator on final line |
[first part of address, formatted as described above] HIGH STREET, MILTON, ML99 0WW |
AddressBase | 6Gb | 32Gb |
AddressBase Plus | 16Gb | 78Gb |
AddressBase Plus Islands | 450Mb | 2Gb |
This technical specification provides detailed technical information about AddressBase. It is targeted at technical users and software developers.
AddressBase provides an address product containing both residential and commercial addresses where a Local Authority address has been matched to a Royal Mail PAF address. This allows users to link additional information about a property to a single address. The product also provides enhancements to the Royal Mail PAF data by assigning an X and Y coordinate on British National Grid and an ETRS89 projection, as well as a primary level classification, and a representative point code describing the positional quality.
This technical specification includes the following sections:
All AddressBase products include the Unique Property Reference Number (UPRN) and are based on same coordinate reference systems.
Please see the General AddressBase information section for additional information that applies across all AddressBase products.
When you receive an order via hard media (DVD), the following files will be supplied for the contracted area of interest (AOI):
Data
Doc
Order_Details.txt
Within the Data directory, data files will be found in their compressed format.
Within the Doc folder, a text file called Label Information.txt will contain information that is printed on the DVD.
The Order_Details text file will provide information about the order, including the order date, currency date and file structure.
When you receive an order of a Managed Great Britain Set (MGBS) via hard media (DVD), the following files will be supplied:
Data
Doc
Resources
readme.txt
There are several items contained within your supply:
Data folder – This folder contains all of your data supply.
Doc folder – This folder contains the Medialis.txt file, which outlines the contents of the data you have been supplied.
Resources folder – This folder contains lookup tables for the local custodian code and AddressBase classification scheme as well as the Header files for the product.
The readme text file – This document provides guidance notes on matters such as the filename referencing used and the directory structure of the DVD.
With a Secure File Transfer Protocol (SFTP) order, the same folder structure is supplied as in DVD Supply of area of interest. The filenames will be slightly different, reflecting the SFTP order number, and the Docs folder will be empty.
Public Sector Geospatial Agreement (PSGA) customers can download their geographic chunk data for AddressBase and AddressBase Plus as well as a full supply of AddressBase Plus Islands via our download service.
The data is supplied as chunked files that cover your selected area. These files are named according to the convention shown below.
When you open your data, you will see a series of zip folders:
Using AddressBase Plus and Islands as an example:
AddressBasePlus_FULL_2020-01-21_001_csv.zip
(Full supply of GB CSV)
AddressBasePlus_ISL_FULL_2020-01-21_001_csv.zip
(Full supply of Islands CSV) or
AddressBasePlus_COU_2020-01-21_001_gml.zip
(COU supply of GB GML)
AddressBasePlus_ISL_COU_2020-01-21_001_gml.zip
(COU supply of Islands GML)
Using AddressBase Plus as an example:
AddressBasePlus_FULL_2011-07-29_TQ2020_csv.zip
(Full supply of CSV) or
AddressBasePlus_COU_2011-07-29_TQ2020_gml.zip
(COU supply of GML)
The AddressBase Plus Islands product is not available in geographic chunks.
The GML and CSV data is supplied in a compressed form (ZIP). Some software can access these files directly, while other software will require the files to be unzipped.
To unzip the zipped data files (.zip extension), use an unzipping utility found on most PCs, for example, WinZip. Alternatively, open-source zipping/unzipping software can be downloaded from the Internet, for example, 7-Zip.
When you unzip the files, the data will be extracted as CSV files, which are ready to use. For example, unzipping AddressBase Plus will extract files similar to the chunks below:
AddressBasePlus_FULL_2020-01-21_001.csv
AddressBasePlus_ISL_FULL_2020-01-21_001.csv
AddressBasePlus_2011-07-29_NC4040.csv
The primary supply mechanism of AddressBase data is referred to as non-geographic chunks. This is a way of dividing up the data into chunks that are supplied in separate volumes, which have a fixed maximum number of records. The supply is not supplied with any reference to the geographic position of records.
Public Sector Geospatial Agreement (PSGA) customers can order Geographic chunks (5km tiles) as well as non-geographic chunks, although geographic chunks are not considered the main form of supply.
All customers are also able to take a complete supply (referred to as a Managed Great Britain Set: MGBS) or an Area of Interest (AOI) as a full supply or a COU supply.
If you receive your data as non-geographic chunks, the filename will be constructed as follows:
productName_supply_ccyy-mm-dd_vvv.format
Where:
ProductName is AddressBase.
supply is defined as FULL
or COU
.
ccyy-mm-dd is the date the file was generated.
vvv is the volume number of the file.
format is the format of the files received, for example, csv
or gml
.
For example:
AddressBase_FULL_2013-05-28_001.gml
(GML full supply)
AddressBase_COU_2013-05-28_001.csv
(CSV COU supply)
If the data has been provided in a ZIP file, the filename will be constructed as follows:
productName_supply_ccyy-mm-dd_vvv_format.zip
For example:
AddressBase_FULL_2013-05-28_001_gml.zip
(GML full supply zipped)
If you receive your data as geographic chunks (PSGA customers only), the filename will be constructed as follows:
productName_supply_ccyy-mm-dd_ngxxyy.format
Where:
ProductName is AddressBase.
supply is defined as FULL
or COU
.
ccyy-mm-dd is the date the file was generated.
ngxxyy is the four-digit grid reference belonging to the 1km south-west corner of the 5km chunk.
format is the format of the files received, for example, csv
or gml
.
For example:
AddressBase_FULL_2013-05-28_NC4040.gml
(GML full supply)
AddressBase_COU_2013-05-28_NC4040.csv
(CSV COU supply)
If the data has been provided in a ZIP file, the filename will be constructed as follows:
productName_supply_ccyy-mm-dd_ngxxyy_format.zip
For example:
AddressBase_COU_2013-05-28_NC4040_csv.zip
(CSV COU supply zipped)
AddressBase is available as a full or COU supply.
A COU supply of data contains records or files that have changed between product refresh cycles. The primary benefit in supplying data in this way is that data volumes are smaller therefore reducing the amount of data that requires processing when compared to a full supply.
COU data enables a user to identify three types of change:
Deletes (CHANGE_TYPE ‘D’) are objects that have ceased to exist in your AOI since the last product refresh.
Inserts (CHANGE_TYPE ‘I’) are objects that have been newly inserted into your AOI since the last product refresh.
Updates (CHANGE_TYPE ‘U’) are objects that have been updated in your AOI since the last product refresh.
A COU file for non-geographic chunked data can be identified by its naming convention. Any change record will be provided as a full record with the appropriate change type, as listed above.
A geographic chunked COU is not supplied as per the non-geographic chunked COU outlined above. Its file naming convention can be found above. If a single record has changed within a specified 5km tile, the entire 5km tile containing all features will be supplied. This means the user will need to remove all features that previously existed in the provided tile(s) and insert the entire new tile(s) in its place.
When users are deleting, inserting or updating features, it is up to the user to consider their archiving requirements. If deleted records are important to your business requirements, you must take appropriate action to archive previous records.
The AddressBase product will be distributed as a comma-separated values (CSV) file or Geography Markup Language (GML) version 3.2. Both of these formats can either be supplied as a full supply or a change-only update (COU) supply.
The CSV supply of AddressBase means:
There will be one record per line in each file.
Fields will be separated by commas.
String fields will be delimited by double quotes.
No comma will be placed at the end of each row in the file.
Records will be terminated by Carriage Return / Line Feed.
Double quotes inside strings will be escaped by doubling.
Where a field has no value in a record, two commas will be placed together in the record (one for the end of the previous field and one for the end of the null field). Where the null field is a text field double quotes will be included between the two commas, for example - , “”,
AddressBase CSV data will be transferred using Unicode encoded in UTF-8. Unicode includes all the characters in ISO-8859-14 (Welsh characters). Some accented characters are encoded differently.
The transfer will normally be in a single file, but the data can be split into multiple files using volume numbers. Most files will only be split where there are more than one million records.
The header row for the CSV is supplied separately and can be downloaded from the product support pages.
The GML Encoding standard is an Extensible Markup Language (XML) grammar for expressing geographical features. XML schemas are used to define and validate the format and content of GML. The XML specifications that GML is based on are available from the World Wide Web Consortium (W3C) website: http://www.w3.org. More information can be found in the Open Geospatial Consortium (OGC) document, Geography Markup Language v3.2.1: https://portal.ogc.org/files/?artifact_id=20509. The GML 3.2.1 specification provides a set of schemas that define the GML feature constructs and geometric types. These are designed to be used as a basis for building application-specific schemas, which define the data content.
A GML document is described using a GML Schema. The AddressBase schema document (addressbase.xsd), defines the features in AddressBase GML.
It imports the GML 3.2.1 schemas which rely on XML as defined by W3C at: http://www.w3.org/XML/1998/namespace.html.
The application schema uses the following XML namespaces, for which definitions are available as given here:
Information about Unicode and UTF-8, the character encoding we have chosen, is available on the Unicode Consortium website: http://www.unicode.org/.
Each feature within the AddressBaseSupplySet:FeatureCollection
is encapsulated in the following member element according to its feature type:
The UPRN of the feature is provided in the XML attribute of the gml:id
See the example records page for specific GML examples.
In the GML supply you can determine the extent of your supply by the <gml: Envelope>
. For example:
This address record follows the lifecycle of a Postcode Address File (PAF) record matched to a Local Authority record. As a matched record is inserted, deleted and updated within PAF, these changes are incorporated into the AddressBase product. Similarly, if the matched Local Authority address record updates an attribute contained within the AddressBase product, this change will be reflected.
The following sub-sections provide details about the attributes included with this feature, their data types in the different output formats, and other important metadata about them.
The following sub-sections provide details about the attributes included with the Address feature type.
Unique Property Reference Number (UPRN) assigned by the LLPG Custodian or Ordnance Survey.
Attribute Name: uprn (GML), UPRN (CSV)
Data Type: Integer (GML), Integer (CSV)
Multiplicity: [1]
Size: 12
Source: Contributing Local Authority / Ordnance Survey
Unique identifier provided by Ordnance Survey.
Attribute Name: osAddressTOID (GML), OS_ADDRESS_TOID (CSV)
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 20
Source: Ordnance Survey
Royal Mail’s Unique Delivery Point Reference Number (UDPRN).
Attribute Name: udprn (GML), UDPRN (CSV)
Data Type: Integer (GML), Integer (CSV)
Multiplicity: [1]
Size: 8
Source: Royal Mail
The organisation name is the business name given to a delivery point within a building or small group of buildings. For example: TOURIST INFORMATION CENTRE
This field could also include entries for churches, public houses and libraries.
Attribute Name: organisationName (GML), ORGANISATION_NAME (CSV)
Condition: Organisation Name or PO Box Number must be present if Building Name or Building Number are all not present.
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 60
Source: Royal Mail
For some organisations, department name is indicated because mail is received by subdivisions of the main organisation at distinct delivery points. For example, Organisation Name: ABC COMMUNICATIONS or RM Department Name: MARKETING DEPARTMENT
Attribute Name: departmentName (GML), DEPARTMENT_NAME (CSV)
Condition: If a Department Name is present, an Organisation Name must also be present.
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 60
Source: Royal Mail
Post Office Box (PO Box) number.
Attribute Name: poBoxNumber (GML), PO_BOX_NUMBER (CSV)
Condition: Organisation Name or PO Box Number must be present if Building Name or Building Number are all not present.
Data Type: CharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 6
Source: Royal Mail
The sub-building name and/or number are identifiers for subdivisions of properties.
For example: Sub-building Name: FLAT 3, Building Name: POPLAR COURT, Thoroughfare: LONDON ROAD
If the above address is styled 3 POPLAR COURT, all the text will be shown in the Building Name attribute and the Sub-building Name will be empty. The building number will be shown in this field when it contains a range, decimal or non-numeric character (see Building Number).
Attribute Name: subBuildingName (GML), SUB_BUILDING_NAME (CSV)
Condition: If a Sub Building Name is present, a Building Name or Building Number must also be present.
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 30
Source: Royal Mail
The building name is a description applied to a single building or a small group of buildings, such as Highfield House. This also includes those building numbers that contain non-numeric characters, such as 44A.
Some descriptive names, when included with the rest of the address, are sufficient to identify the property uniquely and unambiguously, for example, MAGISTRATES COURT.
Sometimes the building name will be a blend of distinctive and descriptive naming, for example, RAILWAY TAVERN (PUBLIC HOUSE) or THE COURT ROYAL (HOTEL).
Attribute Name: buildingName (GML), BUILDING_NAME (CSV)
Condition: Building Name must be present if Organisation Name or Building Number or PO Box Number are all not present.
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 50
Source: Royal Mail
The building number is a number given to a single building or a small group of buildings, thus identifying it from its neighbours, for example, 44. Building numbers that contain a range, decimals or non-numeric characters do not appear in this field but will be found in the buildingName or the sub-BuildingName fields.
Attribute Name: buildingNumber (GML), BUILDING_NUMBER (CSV)
Condition: Building Number must be present if Organisation Name or Building Name or PO Box Number are all not present.
Data Type: Integer (GML), Integer (CSV)
Multiplicity: [0..1]
Size: 4
Source: Royal Mail
A thoroughfare in AddressBase is fundamentally a road, track or named access route on which there are Royal Mail delivery points, for example, HIGH STREET.
Attribute Name: thoroughfare (GML), THOROUGHFARE (CSV)
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 80
Source: Royal Mail
The town or city in which the Royal Mail sorting office is located which services this record. There may be more than one, possibly several, sorting offices in a town or city.
Attribute Name: postTown (GML), POST_TOWN (CSV)
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [1]
Size: 30
Source: Royal Mail
This is used to distinguish between similar thoroughfares or the same thoroughfare within a dependent locality. For example, Millbrook Industrial Estate and Cranford Estate in this situation: BRUNEL WAY, MILLBROOK INDUSTRIAL ESTATE, MILLBROOK, SOUTHAMPTON and BRUNEL WAY, CRANFORD ESTATE, MILLBROOK, SOUTHAMPTON.
Attribute Name: doubleDependentLocality (GML), DOUBLE_DEPENDENT_LOCALITY (CSV)
Data Type: LocalisedCharacterString (GML), char (CSV)
Condition: If a Double Dependent Locality is present, a Dependent Locality must also be present.
Multiplicity: [0..1]
Size: 35
Source: Royal Mail
Dependent locality areas define an area within a post town. These are only necessary for postal purposes and are used to aid differentiation where there are thoroughfares of the same name in the same locality. For example, HIGH STREET in SHIRLEY and SWAYTHLING in this situation: HIGH STREET, SHIRLEY, SOUTHAMPTON and HIGH STREET, SWAYTHLING, SOUTHAMPTON.
Attribute Name: dependentLocality (GML), DEPENDENT_LOCALITY (CSV)
Data Type: LocalisedCharacterString (GML), char (CSV)
Multiplicity: [0..1]
Size: 35
Source: Royal Mail
A postcode is an abbreviated form of address made up of combinations of between five and seven alphanumeric characters. These are used by Royal Mail to help with the automated sorting of mail. A postcode may cover between 1 and 100 addresses.
There are two main components of a postcode, for example, NW6 4DP:
The outward code (or ‘outcode’). The first two–four characters of the postcode constituting the postcode area and the postcode district, for example, NW6. It is the part of the postcode that enables mail to be sent from the accepting office to the correct area for delivery.
The inward code (or ‘incode’). The last three characters of the postcode constituting the postcode sector and the postcode unit, example, 4DP. It is used to sort mail at the local delivery office.
Attribute Name: postcode (GML), POSTCODE (CSV)
Data Type: CharacterString (GML), char (CSV)
Multiplicity: [1]
Size: 8
Source: Royal Mail
Describes the address as a small or large user as defined by Royal Mail.
Attribute Name: postcodeType (GML), POSTCODE_TYPE (CSV)
Condition: If PO Box number is present Postcode Type must be ‘L’.
Multiplicity: [1]
Size: 1
Source: Royal Mail
A value in metres defining the x and y location in accordance with the British National Grid.
Attribute Name: position (GML), X_COORDINATE, Y_COORDINATE (CSV)
Data Type: GM_Point (GML), Float (CSV)
Multiplicity: [1]
Size: X_COORDINATE (precision, scale) – (8, 2), Y_COORDINATE (precision, scale) – (9, 2)
Source: Contributing Local Authority/Ordnance Survey
A value defining the Longitude and Latitude location in accordance with the ETRS89 coordinate reference system.
Attribute Name: positionLatLong (GML), LATITUDE, LONGITUDE (CSV)
Data Type: GM_Point (GML), Float (CSV)
Multiplicity: [1]
Size: LATITUDE (precision, scale) – (9, 7), LONGITUDE (precision, scale) – (8, 7)
Source: Ordnance Survey
Representative Point Code. This code is used to reflect positional accuracy.
Attribute Name: rpc (GML), RPC (CSV)
Multiplicity: [1]
Size: 1
Source: Contributing Local Authority
The country in which a record can be found.
Attribute Name: country (GML), COUNTRY (CSV)
Multiplicity: [1]
Size: 1
Type of Record Change – please see Section 4 for more information.
Attribute Name: changeType (GML), CHANGE_TYPE (CSV)
Multiplicity: [1]
Size: 1
The date on which the address record was inserted into the database in the CCYY-MM-DD format.
Attribute Name: laStartDate (GML), LA_START_DATE (CSV)
Data Type: Date (GML), Date (CSV)
Multiplicity: [1]
Source: Contributing Local Authority
Date on which the Royal Mail address was loaded into the NAG (National Address Gazetteer in the CCYY-MM-DD format – as maintained by Geoplace) hub.
Attribute Name: rmStartDate (GML), RM_START_DATE (CSV)
Data Type: Date (GML), Date (CSV)
Multiplicity: [1]
Source: Royal Mail
The date on which any of the attributes on this record were last changed in the CCYY-MM-DD format.
Attribute Name: lastUpdateDate (GML), LAST_UPDATE_DATE (CSV)
Data Type: Date (GML), Date (CSV)
Multiplicity: [1]
Primary classification of the address record. For example, identifying the record as commercial (value of ‘C’) or residential (value of ‘R’).
Attribute Name: class (GML), CLASS (CSV)
Data Type: CharacterString (GML), char (CSV)
Multiplicity: [1]
Size: 1
Source: Contributing Local Authority
This section describes the features (one for CSV and two for GML) which make up the AddressBase product, giving the following information about each attribute.
The name of the attribute and what it is describing.
A condition associated with this attribute (optional).
The nature of the attribute, for example a numeric value or a code list value.
Describes how many times this element is expected to be populated in the data. An attribute may be optional or mandatory within the AddressBase product. These are denoted by:
‘1’ – there must be a value.
‘0..1’ – population is optional but a maximum of one attribute will be returned These values may be used in combination.
AddressBase is structured as a flat file. The data structure in this document is described by means of Unified Modeling Language (UML) class diagrams.
The AddressBase product is constructed as per the following UML diagrams.
Definition: This address record follows the lifecycle of a Postcode Address File (PAF) record matched to a Local Authority record. As a matched record is inserted, deleted and updated within PAF, these changes are incorporated into the AddressBase product. Similarly, if the matched Local Authority address record updates an attribute contained within the AddressBase product, this change will be reflected.
The UML model of AddressBase in CSV format can be seen in the UML diagram below; classes from the Ordnance Survey product specification are coloured orange; all code lists are coloured blue, while enumerations are coloured green.
Definition: This address record follows the lifecycle of a Postcode Address File (PAF) record matched to a Local Authority record. As a matched record is inserted, deleted and updated within PAF, these changes are incorporated into the AddressBase product. Similarly, if the matched Local Authority address record updates an attribute contained within the AddressBase product, this change will be reflected.
The UML model of AddressBase in GML format can be seen in the diagram below. In the UML diagram, classes from the Ordnance Survey product specification are orange, all code lists are coloured blue and enumerations are green.
A common requirement for customers using the AddressBase products is to search for properties using full or partial addresses. Address searches may return a large number of addresses, a short list of possibilities, a single match or no results, depending on the search criteria.
There are many methods of implementing an address search, from free text queries through to structured address component searches. This guide will step through two such approaches that may be used when working with AddressBase and/or AddressBase Plus.
These methods are not intended as recommendations; they are merely examples of how to get maximum value out of the product when implementing an address search function.
One type of search implementation involves a single ‘search engine’ style text box, into which a user can type all or some of an address. For example:
Find address | Results |
---|---|
In this scenario, the user can choose to type anything in Find address, which may be just one component of an address (for example, a postcode, street name or building name), several parts of an address (for example, street name + town name, house name + postcode, etc.) or even (rarely) a complete address.
There may or may not be commas between search items, or address components can be entered with or without capitalised letters, etc. In short, with this search method, there is no structure to the user input and the search methodology must be designed with this in mind.
The other common type of implementation for address searches involves entering search criteria in a structured way (for example, with a different text box for each major address component).
This method guides the user to enter known components of an address and creates a predictable user input structure around which to build a search function. While generally simpler to use and implement, it can be less user-friendly, particularly in cases where it is not obvious which box to type an address component into, for example, is Richmond Terrace a building name or a street?
This guide suggests how to implement the two search methods described above. Both should be used alongside the instructions on formatting single address labels.
The methods described here may be adapted to work with both AddressBase Plus, AddressBase Plus Islands and AddressBase; however, in the case of AddressBase, only Delivery Point Addresses are searchable, so the geographic guidance will not apply to this product.
An address search operation typically requires two stages of interaction from a user and several processing steps from the underlying IT system. These steps can be summarised in the following diagram:
The second user interaction can be omitted if there is only one result returned from the query. In almost all cases, there should be an option to ‘search again’ at the second and third stages in case no results are returned, or if none of the options shown is the required address.
Of course, different applications require different approaches; however, the general principles of the above process apply in all cases where an address is searched for based on user-entered criteria.
Within an interface that accepts structured user input for an address search, it is necessary to ‘map’ the fields presented to the user with those found within AddressBase or AddressBase Plus. In particular, any query will need to test multiple fields for a given input and will need to combine result sets from the two different address formats of AddressBase Plus (or the single address format of AddressBase) in order to produce the most complete result set.
Generally, a search form will describe a simplified view of an address in order to keep the user interface tidy and intuitive. Users may be given a set of text boxes to fill in, generally including building name, building number, street name, locality name, town name and postcode. The relationships between some common search fields and the fields found in AddressBase Plus are as follows:
The above mapping is an example only, and it is possible to breakdown the search fields differently, in which case, a different mapping would be required. The important thing is to consider all possibilities for how data might be recorded. For example, a business name can sometimes appear as an organisation name or a building/PAO name depending on circumstances, so both must be checked when creating a search query.
Numbers need to be handled very carefully due to the presence of suffixes and ranges. There are two options for structuring the search input in these cases:
A single ‘number’ box can be used (as shown above in Flat/Subdivision Number and Building Number), which will then require some string manipulation to split the input into the appropriate numeric range and suffix components in order to search the geographic addresses; or
Four boxes can be provided for each number (start number, start suffix, end number and end suffix), which would then need to be combined into an appropriate string to search the Delivery Point Addresses.
The basic rules to adhere to when generating a search query from structured input are as follows:
Ignore any search boxes that are not filled in with values.
Where a value is entered, assume that a match on at least one of the mapped fields is essential.
In SQL query terms, this means that each search term should generate a sub-query that searches each of the mapped fields (using OR), and that these sub-queries should then be combined together (using AND) into a single search query. The following SQL code illustrates this (for the Delivery Point Address search only) for an example where a street, locality and town name have been entered by the user:
In the above example, streetsearchtext
, localitysearchtext
, and townsearchtext
represent user- entered search terms (which could be parameters within an SQL function) and the GetFormattedAddress(*)
function is a hypothetical user-defined function that returns the formatted address as a single string (suitable for display in the user interface). For more information on formatting addresses, please see Creating a single-line or multi-line address.
On top of this, for a complete query, the two different types of addresses should be queried separately (Geographic and Delivery Point Addresses), and the two result sets should be amalgamated into a single set using a UNION. The following example builds upon the previous example to include Geographic Addresses as well as Delivery Point Addresses.
The SQL UNION
operator will combine the two result sets, discarding any exact duplicates. (Retaining the exact duplicates requires the use of UNION ALL
, but that is not desirable in this example.)
The resulting output from this query will be a set of search results as formatted addresses along with their UPRN. Exact duplicates will be omitted, but all ‘variations’ of the same address will be output (one row for each variation, with the same UPRN repeated more than once potentially). It may be wise to return the Postal Address Flag values against each to enable further filtering, for example, to restrict the results to postal addresses only. Note that the Postal Address Flag is only available in AddressBase Plus. All records in AddressBase are deemed postal as they are from Royal Mail’s PAF data.
A flaw in the above examples is the use of equality operators. In practice, because people do not tend to be consistent with capitalisation of letters, the SQL ‘LIKE’ operator might work better, and depending on the nature of the application, a ‘%’ wildcard could be appended to the end of each search term to allow only the first few letters of an address component to be entered. For example:
Alternatively, if exact matches are required but case sensitivity is not, then the UPPER() or LOWER() SQL functions can be used on each side of the equals sign in comparisons (a solution that should work in all databases):
Finally, to combine all of the approaches, the following would work for maximum flexibility:
When offering a ‘search engine’ style search feature with just a single text box to enter search terms, a wholly different approach is required. No assumptions can be made about the order, format or style of the user input, and the data will need to be ‘indexed’ in a way that facilitates searches of this type.
Search engine style searches are likely to require the creation of an additional index/lookup table for addresses. Such a table is likely to consist of just two main columns: a key value (UPRN) and a formatted address string. Additional columns may be required to allow filtering of results (such as the AddressBase Postal flag values from AddressBase Plus, which would allow the results to be filtered by different address statuses).
The following table shows a possible address index table structure:
Note how the addresses have been formatted as a single text string with a single space between each word (although leaving commas in would do no harm). All forms of each address (both PAF and geographic) have been added to the index, so there can be several rows with the same UPRN. To speed up complex searching, an appropriate index could be added to the Address Text field, such as a full text search index.
Once a suitable search index is in place, the query itself can be put together. The basic idea is to split the user input into search terms by removing commas, double spaces, and other unnecessary whitespace and then splitting it at each single space, as follows:
User input: 4, High Street, westville, wv17
Capitalised, with commas and double-spaces removed:
4 HIGH STREET WESTVILLE WV17
Split into separate search terms:
4
HIGH
STREET
WESTVILLE
WV17
Once the user input has been pre-processed into separate search terms, a query can be generated. The key assumption in this example will be that ALL search terms must be matched against the index table to be considered as a result. This implies a query where each value is matched using an ‘AND’ operator. In order to search the whole index, the ‘LIKE’ operator will need to be used along with a ‘%’ wildcard on either side of the search text. A suitable search query for the above example would be as follows:
This query would return all rows from the index table that contain all of the search terms, along with the appropriate UPRNs. The following table shows how the index table would be used in the above example to return relevant results:
This result set can then be presented to the user, who can select the most appropriate record, which can then be retrieved in full using the UPRN.
Of course, in a practical implementation, the above query would need to be dynamically generated, with a separate condition added for each search term. This example is quite a strict search query that requires all search terms to be present. Many layers of complexity could be added to allow partial and ‘fuzzy’ matches, and to return confidence scores, for example, but such enhancements are beyond the scope of this guide.
This guide is intended as an introduction to implementing address search functionality using AddressBase, AddressBase Plus and AddressBase Plus Islands. The following list is a summary of the main points:
A user front-end for an address search may contain a single, search engine style text box or multiple text boxes representing different parts of an address.
A typical address search function takes place in three stages:
A user enters search text.
A query is run, returning a set of possible matches.
The user selects the address of interest and the full record is then returned.
With a structured search interface, the addresses can be queried directly by mapping the various address fields to the text boxes supplied.
For an unstructured (single text box) interface, it is necessary to create an index table with fully formatted address strings against each UPRN. Queries can then be run against this index table by splitting the user input into individual search terms and requiring them all to be present.
It is possible to filter results by status in AddressBase Plus (for example, postal or non-postal).
Any search function should search all forms of an address (both Geographic and Delivery Point Addresses).
Careful consideration should be given to the use of ‘fuzzy’ search algorithms (such as using wildcard or sound-alike searches).
There are many Date columns within the AddressBase product. Where a type format of Date has been used in the above attribute tables the data will be defined in the following format.
Value | Type | Notes |
---|
This feature is formally known as the GML feature collection and is used to define a collection of features.
This is not supplied as part of the CSV supply. Please see and for more information.
The following sub-sections provide details about the attributes included with this feature, their data types in the different output formats, and other important metadata about them.
Time the data was extracted from the database.
Attribute Name: queryTime (GML), Not provided (CSV)
Data Type: DateTime (GML)
Multiplicity: [1]
Size: 1
The date given as part of a change-only query.
Attribute Name: queryChangeSinceDate (GML), Not provided (CSV)
Data Type: Date (GML)
Multiplicity: [1]
Size: 1
The naming of attributes between GML and CSV will be different due to the requirements of the file formats. For convenience the following table maps the CSV attribute name to the GML attribute name.
CSV | GML |
---|
The header files and local custodian codes for AddressBase are available for download from the.
This product is available to try out online using one of our three sets of sample data (Exeter, Newport and Inverness) through the OS MasterMap product viewer:
The following section provides example records for both the CSV and GML supplies. Please note that the data given is to provide an example only and should not to be used as accurate data.
Please note how not all attributes are provided where the field is null.
Prefix | Namespace Identifier | Definition available at |
---|---|---|
Member Element | Feature Type |
---|---|
Code List Name:
Code List Name:
Code List Name:
Code List Name:
Results |
---|
Search Box | Mapped Delivery Point fields | Mapped geographic fields |
---|---|---|
UPRN | Address Text | Statuses (multiple fields) |
---|---|---|
Address text | Statuses (multiple fields) | |
---|---|---|
gml
xsi
Built into XML – http://www.w3.org/TR/xmlschema-1/
xlink
Xlink – http://www.w3.org/1999/xlink
<abpl:addressMember>
Address
Rose Cottage, Main Street, Fieldtown, Addressville, SW99 9ZZ
Rose Cottage, Main Street, Ashford, AS45 9PP
Rose Cottage, Main Street, Buxtew, Monley, MO88 4TY
And so on...
Business Name
Organisation_Name
Organisation
PAO_Text
SAO_Text
Flat/Subdivision Name
Sub_Building_Name Department_Name
SAO_Text
Flat/Subdivision Number
Sub_Building_Name
SAO_StartNumber SAO_StartSuffix SAO_EndNumber
SAO_EndSuffix
Building Name
Building_Name
PAO_Text
Building Number
Building_Number
Building_Name (in cases where a suffix or range is present)
PAO_StartNumber PAO_StartSuffix PAO_EndNumber
PAO_EndSuffix
Street
Thoroughfare Dependent_Thoroughfare
Street PAO_Text
Locality
Dependent_Locality Double_Dependent_Locality
Locality Town Street
Town
Dependent_Locality Post_Town
Town Locality
Postcode
Postcode
Postcode_Locator
123456789012
4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB
Local Authority
123456789012
FLAT 4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB
PAF
123456789013
4 HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB
Non-postal
894756389092
4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Geographic + PAF
894756389132
ROSE COTTAGE 4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Geographic
274859037849
FLAT 4 HIGHBURY COURT HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Geographic + PAF
482974769830
MAPS4U LTD HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Geographic + PAF
2007-10-24 | Date | Date columns will follow the structure: CCYY-MM-DD |
UPRN | uprn |
OS_ADDRESS_TOID | osAddressTOID |
UDPRN | udprn |
ORGANISATION_NAME | organisationName |
DEPARTMENT_NAME | departmentName |
PO_BOX_NUMBER | poBoxNumber |
SUB_BUILDING_NAME | subBuildingName |
BUILDING_NAME | buildingName |
BUILDING_NUMBER | buildingNumber |
DEPENDENT_THOROUGHFARE | dependentThoroughfare |
THOROUGHFARE | thoroughfare |
POST_TOWN | postTown |
DOUBLE_DEPENDENT_LOCALITY | doubleDependentLocality |
DEPENDENT_LOCALITY | dependentLocality |
POSTCODE | postcode |
POSTCODE_TYPE | postcodeType |
X_COORDINATE | position |
Y_COORDINATE |
LATITUDE | positionLatLong |
LONGITUDE |
RPC | rpc |
COUNTRY | country |
CHANGE_TYPE | changeType |
LA_START_DATE | laStartDate |
RM_START_DATE | rmStartDate |
LAST_UPDATE_DATE | lastUpdateDate |
CLASS | class |
CLOVER AVENUE, SW99 9ZZ
1, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
2, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
3, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
4, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
5, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
6, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
7, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ