Data

If you’re looking at making a data visualisation, you may already have all the data you need but you also may need to source some more. It’s important to be conscious that the visualisation you produce can only really be as good as the data that goes into it. Therefore, before starting to make your data visualisation, it’s important to think about the data in terms of its quality and also consider any licencing and copyright restrictions. Where did the data come from? Is it an authoritative source? Are you free to use and share the data?

This section also looks briefly at geospatial data formats and sources of geospatial data.

Data Quality

Data quality and completeness can vary hugely depending on the source of the data. We recommend that before starting to create a data visualisation you think about the following things:

  • Reputable source: Is your data coming from a reputable source? Can you trust it?

  • Completeness: Is the dataset complete or are there gaps?

  • Accuracy: Is the data fit for purpose in terms of its positional accuracy?

  • Source Scale: Is the data fit for purpose in terms of its granularity/scale of capture?

  • Currency: When was the data published? Is it up to date? Does this matter?

  • Consistency: If you’re overlaying or joining data from different sources, do they work together – is the attribution consistent and is the data based on the same coordinate reference system or projection.

It may be that the data you have is the only option available but it’s worth considering the quality of the data before you start. If there are doubts as to the quality of the data, this could be incorporated into your data visualisation design, perhaps by blurring the edge of symbols or increasing the transparency to reflect uncertainty. Or publishing a separate or inset map which addresses data uncertainty.

Not all datasets will be free to use/publish and there may be limits on its use and distribution. Make sure you check the licencing terms carefully before publishing anything. On the data visualisation itself make sure you credit the source of your data appropriately; this will likely include a copyright statement.

Data formats

Geospatial data can come in many different formats and the format of your data may influence the software you use to analyse and display it. Or the software might determine what file format you need. Below we provide a brief summary of some of the main geospatial data formats.

There are two main types of geospatial data, vector data and raster data. Vector data comprises data stored as points, lines and polygons, whilst raster data consists of a series of cells, with a value (attribute) stored for each cell individually. In addition to these, geographic database files are commonly used to store data, comprising a structured set of geographic data and associated information.

Vector file formats

  • Esri Shapefile (.shp) – commonly used file format which most geospatial software can handle. A set of 3 files is required for shapefiles to work. The .shp file contains the feature geometry, the .shx file contains the shape index position and the .dbf file contains the attribution data. Another thing to note about shapefiles is that that attribute names are limited to 10 characters.

  • Geopackage (.gpkg) – see Geographic Database Files below.

  • Geographic Markup Language, GML (.gml) – a form of eXtensible Markup Language (XML) which can store geographic coordinates in text format. GML files are human and machine readable and can be edited in any text editor.

  • GeoJSON (.geojson .json) – commonly used for web-mapping, and stores the coordinates of points, lines and polygons as text in JSON notation. GeoJSON files are human and machine readable and can be edited in any text editor.

  • Google Keyhole Markup Language (.kml .kmz) – another XML based language which is used for Google Earth. The coordinates are defined in the WGS 84 coordinate reference system.

  • MapInfo TAB (.tab) – used in MapInfo software and like shapefiles require a set of files in order to define the geometry and attributes of features. These include TAB, DAT, ID, MAP and IND files.

  • OpenStreetMap (.osm) – an XML based language developed by Open Street Map for their crowdsourced mapping

Raster file formats

  • GeoTIFF (.tiff) – commonly used for storing georeferenced imagery data such as that collected from satellites.

  • ASCII Grid (.asc) – a simple format of storing raster data

  • DEM (.dem) – an ASCII based file which is commonly used for capturing and storing digital elevation models

Geographic Database Files

  • Geopackage (.gpkg) – geopackages are serverless SQLite databases and can contain vector and raster datasets plus associated attribute tables.

  • Esri File Geodatabase (.gdb) – for use in Esri software, GDBs can store vector and raster datasets plus associated attribute tables.

  • Mapbox MB Tiles (.mbtiles) – developed by mapbox and designed for used in Mapbox and other web applications, mbtiles can store multiple sets of vector or raster tiles.

Other data formats

  • Lidar (.las) – a file format designed to store and transfer 3-dimensional LIDAR point cloud data in x,y,z format.

  • Point cloud (.xyz) – a format for storing point cloud data, often XYZ values but could also be RGB, intensity values or other lidar values.

  • CAD files (.dwg) – used for storing and transferring 2 and 3-dimensional design data and metadata from CAD (computer-aided design) software.

Web Services

  • Web Map Service (WMS) – a standard protocol for serving maps as images over the web. Users can’t interact with features on the map.

  • Web Feature Service (WFS) – a standard protocol for serving maps in vector format (i.e. points, lines and polygons), over the web. This allows users to request specific feature data from a server and perform queries or edit or update the data if permissions to do so are given. Our OS Features API and OS NGD API - Features are examples of WFS.

  • Web Map Tile Service (WMTS) – similar to a WMS, however, the maps are served to the user as pre-rendered tiles allowing for more seamless panning and zooming. Our OS Maps API is a WMTS.

  • Vector Tiles – the vector equivalent of WMTS, serving vector data as a series of tiles. This allows for quicker rendering in a web-browser. Our OS NGD API – Tiles and OS Vector Tile API are examples of Vector Tile services.

Cartographic style files

  • QGIS style file (.qml) – used to set the symbology of a specific data layer in QGIS.

  • Layer files (.lyr and .lyrx) – used to set the symbology of specific data layers in Esri ArcMap (.lyr) and ArcPro (.lyrx).

  • Styled Layer Descriptor (SLD) – stores and defines how a data layer will be styled and displayed in a WMS.

Where can you access geospatial data?

Ordnance Survey DataHub

If you’re looking for authoritative geospatial data for Great Britain, Ordnance survey have a range of free and paid for mapping products available through the OS DataHub. Our open data sets such as OS Open Zoomstack and Open Map Local (and many more datasets) are freely available to everyone. Public Sector organisations can get access to all of our premium data and services if they are signed up to the PSGA.

OpenStreetMap

OpenStreetMap is a crowd-source mapping project and allows anyone to add information to the map. It’s a great source of geospatial data however, it’s important to be aware that data completeness is not a given and some areas may have more detailed mapping than others. In addition, the accuracy will vary by creator. Often though, OSM data is the only (and therefore best) mapping data available for certain places.

Natural Earth

Natural Earth offers free public domain small-scale raster and vector (cultural and physical) data at 1:10million, 1:50million and 1:110million with global coverall.

USGS Earth Explorer

Free source of satellite and aerial imagery with global coverage. It includes Landsat, Sentinel-2 and land cover data as well as Digital Elevation Models.

Esri ArcGIS Hub

The Esri Open Data hub contains thousands of open data sets from organisations across the world. Data can be downloaded in a variety of file formats depending on the data and organisation.

Esri Living Atlas

If you’re using Esri products such as ArcGIS Pro or ArcOnline, the Living Atlas is a great source of data and is one of the largest curated collections of authoritative maps, apps and data layers in the world.

Inspire Geoportal

If you are looking to find out what datasets are available within Europe, The Inspire Geoportal allows you to search and discover what datasets are out there, who owns them and how you might get access to them.

Last updated