🆕 IDs vs Spatial Relationships

What is an ID and what IDs are in OS Data?

An ID uniquely identifies each row of data or geographical feature.

OSID (Ordnance Survey Identifier) – The OS NGD primarily uses a new identifier called the OSID to uniquely identify features. The OSID will be used persistently and will allow the unique identification of records in the OS NGD. It should be noted that the OSID will not be unique across the OS NGD, but rather only unique at a feature type level. The reason for this is, when possible, the same OSID will be used on multiple features when they represent the same geographical feature.
UPRN (Unique Property Reference Number) – Is a unique numeric identifier for every spatial address in Great Britain. It applies a ‘common standard’ for addressable buildings and objects. One that is machine-readable, and in a consistent reference format.
USRN (Unique Street Reference Number) - is a unique and persistent identifier for every street, road, track, path, cycle track or cycle way in Great Britain.
TOID (Topographic Identifier) - Is a unique and persistent identifier for each and every feature found in OS MasterMap products. A TOID shows unique identifiers for a wide range of landscape and built environment features, with a generalised location, extracted from OS MasterMap products.

What do OS Identifiers look like?

OSID – 9d697b8b-b351-4e9a-9c7b-d7d0d155f513
UPRN – 906483712
USRN – 7905946
TOID – osgb4000000073338613

Using IDs to join data

Pros:

IDs are unique across datasets, allowing multiple datasets with common ID fields to be joined with minimal difficulties.
Avoid data duplication as IDs are unique. Maintaining data integrity.
UUIDs (Universally Unique ID)/GUIDs (Globally Unique ID) can be generated offline.

Cons:

ID data types often vary between datasets. Therefore, the common IDs must be the same data types before joining.
Less flexibility – Both tables must contain the same IDs.
UUIDs/GUIDs take up more database storage.

IDs in Ordnance Survey Data

Ordnance Survey produces various datasets of unique identifiers for a wide range of landscapes and built environment features for projects that require greater data connectivity, all available for free via the Ordnance Survey website.

OS Open Linked Identifiers – a dataset containing the authoritative relationships between Unique Property Reference Numbers (UPRNs), Unique Street Reference Numbers (USRNs) and Topographic Object Identifiers (TOIDs).
OS Linked Identifiers API – gives direct access to identifiers for a large number of feature types such as address records, building outlines, road surface area, road names, road sections and street records as well as the relationships between them that we provide links for, without having to download, store, or manage large, complex datasets.
OS Open TOID – shows unique identifiers for a wide range of landscape and built environment features, with a generalised location, extracted from OS MasterMap products.
OS Open USRN - enables users to share and link data related to UPRNs, which can be spatially analysed and visualised using their accurate location.
OS Open UPRN – an open dataset containing all USRNs from OS MasterMap Highways Network with a simplified line geometry.

Spatial Querying to join data

Spatial relationships indicate how two geometries interact with one another. They are a fundamental capability for querying geometry.

Pros:

Increased flexibility when joining data.
Allows for spatial analysis and enhancing this data by combining with other attributes.
Gives the opportunity to visualise data differently with Mapping, which can make it easier to communicate patterns/findings.

Cons:

Complex spatial operations, queries involving multiple spatial objects or extensive calculations can lead performance degradation and require significant processing power and time. Therefore, to optimise complex query performance advanced query optimisation and indexing techniques may be required, along with hardware enhancements.
Inaccurate and inconsistent spatial data can lead to poor results and unreliable analysis.
Domain knowledge is required.

Spatial Query Examples

To make it easy to determine common spatial relationships, the OGC defines a set of named spatial relationship predicates. Examples of these in PostGIS are as follows:

Relationship

Result

ST_CONTAINS

Returns TRUE if geometry A contains geometry B

ST_WITHIN

Returns TRUE if geometry A is within geometry B.

ST_DWITHIN

Returns TRUE if the geometries are within a given distance.

ST_OVERLAPS

Returns TRUE if geometry A and B "spatially overlap".

ST_CROSSES

Compares two geometry objects and returns TRUE if their intersection "spatially crosses"; that is, the geometries have some, but not all interior points in common.

ST_TOUCHES

Returns TRUE if A and B intersect, but their interiors do not intersect.

ST_INTERECTS

Returns TRUE if two geometries intersect. Geometries intersect if they have any point in common. This would return a set of the results that is the same as all of the relationships listed above combined

ST_DISJOINT

Returns TRUE if two geometries intersect. Geometries intersect if they have any point in common.

ST_EQUALS

Returns TRUE if the given geometries are "topologically equal“, i.e., the geometries have the same dimension, and their point-sets occupy the same space.

SQL query using a spatial join

Return all train stations within 10km of my location.

SELECT a.*
FROM railway_stations AS a
LEFT JOIN my_location AS b
ON ST_DWithin(a.geom, b.geom, 10000)

Indexing

Indexing columns that you are querying or trying to join on has huge benefits. Below are a few examples, however it will be beneficial to take a look for yourself to understand fully the benefits of indexing.

Indexes allow database engines to locate specific rows based on the indexed column faster without needing to scan the entire table.
Indexes make data easier to navigate and retrieve information by transforming the data into an organised structure.
They can help to ensure data consistency. Unique indexes ensure no two rows in the table have the same key values, helping to prevent duplicate rows.

When To Use IDs vs When To Use Spatial Relationships

IDs

Advantages

Simple and fast.
Works with non-spatial data.
Easy to understand and implement.

Disadvantages

Requires datasets to have at least one common field.
Doesn’t consider spatial data, therefore limits analysis capabilities.

When?

You have a common field and need a quick and straightforward join.

Spatial Relationships

Advantages

Utilises spatial relationships between features.
Data can be analysed and visualised based on spatial criteria.

Disadvantages

Increased complexity and computational intensity.
Requires spatial data and GIS tools.

When?

Spatial data is crucial to your analysis or could help to give greater insight in your analysis. This could be for geographic studies or working with spatial data layers.

Disclaimer:

Spatial querying can be accomplished in multiple Programs/Software using various programming languages.

For the purposes of this Lightning Talk, when referring to spatial querying, we will be using PostGIS in PostgreSQL.

Spatial querying can also be accomplished using any software/programs that have spatial functionality.

This content has been developed from what was originally a Lightning Talk PowerPoint slide set. These slides are available to PSGA members to view and download from the PSGA members area of the OS website

PreviousEnd User Licence vs Contractor Licence NextWhy we should capture good quality addresses at source

Last updated 10 months ago

Was this helpful?