GML supply options
Last updated
Last updated
There are several options available to customers when ordering data in GML format that provide additional metadata or aid data management.
To make the management of large areas easier, data is split into chunks, each of which covers a nominal square area. Data can also be supplied as part of a squared area to complete a predefined area as part of a nominated file size. Two types of chunks are available: geographic and non-geographic chunks. Chunk boundaries are imposed purely for the purpose of dividing large supply areas into pieces of manageable size in a geographically meaningful way. Both Full Supply and updates (whether COU or Full Resupply) are chunked.
As OS MasterMap Topography Layer data is seamless, GML files containing large areas could be very data intensive. In order to provide files of a manageable size, data supplies are divided into chunks of user-specified size, each of which is supplied in a separate GML file. The figure below illustrates how geographic chunks work:
The process of chunking has the following steps:
The user submits an AOI by either drawing an AOI or uploading a pre-defined AOI to the OS Data Hub platform (or OS Orders).
Both online ordering systems create a grid covering the entire area based on the user-specified size (for example, 25km²).
Each square within the grid forms a chunk file.
Each feature that intersects that square goes into the chunk file.
National (GB) cover of OS MasterMap Topography Layer in GML format is supplied in 25km² chunks.
In the case shown in the image above, 10 chunks have been created. The central chunk is a complete grid square; the other chunks are partly bounded by the data selection polygon. The upper-left square shows the effect when the data selection polygon crosses a grid square twice – i.e. two or more separate chunks are created.
A consequence of chunking is that some features are supplied in more than one chunk. Systems reading OS MasterMap Topography Layer data must identify and provide the option to remove these duplicated features.
If a chunk contains no information relating to a user’s selected themes, then it is not supplied.
Chunks cannot be treated as persistent data management units; as it is a floating grid, the origin of the chunking grid may differ between orders, particularly if the contract area changes or if a different chunk size is ordered.
The packaging of a seamless dataset into chunks means that where a feature lies across or touches the boundary of several chunks, it is supplied in all the related chunks. This is because an individual feature is the smallest unit within the OS MasterMap Topography Layer, and it cannot be physically split into two or more parts.
When a polygon falls across a chunk edge, but its bounding line(s) lie outside, it may not be included in that chunk. It will be included in the adjacent chunk, unless the polygon is at the edge of the contract area, in which case, the line will not be supplied at all.
When a polygon changes so that it no longer falls in the same chunk, for instance, when an OS MasterMap Topography Layer feature used to lie partly inside a chunk and instead is now reduced in size so it is wholly within an adjacent chunk, it is reported as a deleted feature (a Delete) in one chunk and as a modified feature (new version, known as an Update) in the adjacent chunk. This is shown in the diagram below.
It is possible for OS MasterMap Topography Layer features with point geometry to be included in multiple adjacent chunk files. This is because the query used to populate a chunk file includes all features that touch its boundary, and this boundary is shared with adjacent chunks. Therefore, loading software must be able to identify and remove duplicate point features across multiple files in the same way as features represented by lines and polygon geometries.
This supply format delivers the files in a fixed nominal size, as opposed to a given geographic area depending on user preference.
Unlike in geographic chunking, each feature in non-geographic chunking appears in only one chunk file. It is possible for features from various geographic locations to appear in a single file, and for adjacent features to appear in different files. Non-geographic chunk files are designed for use as a set to load spatial databases but can be used in a file format if all chunks are translated or imported into the system at the same time.
It is not possible to tell in which file a particular feature will be found before reading the files. With non-geographic chunks, there are no duplicate features lying across chunk edges; this speeds up the translation process.
The features shown in red below can end up in the same non-geographic chunk even though they are not adjacent to each other.
The Feature Validation Dataset (FVDS) is a set of files that can optionally be supplied with either a Full Supply COU or an AOI COU of an OS MasterMap Topography Layer order in GML format. The FVDS allows a user to validate that the data holding contains the correct set of features after loading. It does this by reporting on all the data it expects to find in the holding after the application of the supply, not just what is contained in the supply.
The FVDS is intended to be used for periodic checks on data holdings maintained by a COU regime. It is not necessary for users to order it with every supply, as processing it will slow the translating process. It can also be used to check that an initial supply of OS MasterMap Topography Layer data has been correctly loaded. The FVDS can be used with both geographic and non-geographic chunk file options.
The FVDS is itself divided into files on a non-geographic basis, using a 10MB nominal file size.
The FVDS is a comma-separated value (.csv) text file format that gives the TOID, version number and version date of every feature that should exist in the current data holding, based on the polygon extent,
themes, polygon format and extraction date of the current order. Each .csv file is compressed to a .gz file using the same compression algorithm used for OS MasterMap Topography Layer GML files.
An order summary file in GML format will be supplied with all OS MasterMap Topography Layer orders, containing the order information specified by the user. This information includes:
The order number
Query extent polygon(s) of the order
The order type: Full Supply or COU (for COU orders, the change since date will be included)
The themes requested
The chunk type: Non-geographic or geographic
The chunk size (in MB for non-geographic chunks; in km² for geographic chunks)