Searching for addresses

A common requirement for customers using the AddressBase products is to search for properties using full or partial addresses. Address searches may return a large number of addresses, a short list of possibilities, a single match or no results, depending on the search criteria.

There are many methods of implementing an address search, from free text queries through to structured address component searches. This section will step through two such approaches that may be used when working with AddressBase Premium products: free text search and structured component search.

These methods are not intended as recommendations; they are simply examples of how to get maximum value out of the products when implementing an address search function.

One type of search implementation involves a single ‘search engine’ style text box, into which a user can type all or some of an address. For example:

Find address
Results

CLOVER AVENUE, SW99 9ZZ

1, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

2, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

3, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

4, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

5, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

6, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

7, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ

In this scenario, the user can choose to type anything in Find address, which may be just one component of an address (for example, a postcode, street name or building name), several parts of an address (for example, street name + town name, house name + postcode, etc.) or even (rarely) a complete address.

There may or may not be commas between search items, or they may have been entered with or without capitalised letters, etc. In short, with this search method, there is no structure to the user input and the search methodology must be designed with this in mind.

The other common type of implementation for address searches involves entering search criteria in a structured way (for example, with a different text box for each major address component).

An example of a structured address component search.
An example of a structured address component search.
Results

Rose Cottage, Main Street, Fieldtown, Addressville, SW99 9ZZ

Rose Cottage, Main Street, Ashford, AS45 9PP

Rose Cottage, Main Street, Buxtew, Monley, MO88 4TY

And so on...

This method guides the user to enter known components of an address and also creates a predictable user input structure around which to build a search function. While generally simpler to use and implement, it can be less user-friendly, particularly in cases where it is not obvious which box to type an address component into; for example, is Richmond Terrace a building name or a street?

The following sub-sections suggest how to implement the two search methods described above. Both methods should be used alongside the instructions on formatting single address labels given in Section 11.

Understanding the different addresses available

As described in Creating a single-line or multi-line address above, at a high level, the AddressBase Premium products provide two different types of address: the Delivery Point Address and the Geographic Address. However, for some Geographic Addresses, an alternative, provisional or historical variant of the approved record may also be provided as well as the approved address (all sharing the same UPRN).

The table below outlines what these addresses are and how to access them in the products. It provides a breakdown of the location and definition of Delivery Point addresses and the four categories of Geographic Addresses available in AddressBase Premium products.

Address type
What is it?
Where is it?

Delivery Point Address

The postal address as assigned to the property by Royal Mail (and widely used by the public).

Delivery Point Address table.

Approved Geographic Address

The legal / approved address as assigned by the local naming and numbering authority.

LPI table with Logical Status = 1, joined to Street Descriptor, Organisation and BLPU tables.

Provisional Geographic Address

Provisional addresses may exist for a property from the moment that an address has been granted planning permission to be built to the time when construction has been completed.

LPI table with Logical Status = 6, joined to Street Descriptor, Organisation and BLPU tables.

Alternative Geographic Address

Any alternative addresses that may exist for this property (for example, alternative names). There may be more than one alternative address per property.

LPI table with Logical Status = 3, joined to Street Descriptor, Organisation and BLPU tables.

Historical Geographic Address

Any historical addresses (recorded since data collection began) that may have existed in the past for this property (for example, previous house names or business names, and so on). There may be more than one historical address per property.

LPI table with Logical Status = 8, joined to Street Descriptor, Organisation and BLPU tables.

The search operation

An address search operation typically requires two stages of interaction from a user, and several processing steps from the underlying IT system. These steps are summarised in the following diagram:

Diagram Summarising the steps needed to successfully implement an address search operation.
Diagram Summarising the steps needed to successfully implement an address search operation.

The second user interaction can be omitted if there is only one result returned from the query. In almost all cases, there should be an option to ‘search again’ at the second and third stages in case no results are returned, or if none of the options shown is the required address.

Of course, different applications require different approaches; however, the general principles of the above process apply in all cases where an address is searched for based on user-entered criteria.

Generating a search query from structured user input

Within an interface that accepts structured user input for an address search, it is necessary to ‘map’ the fields presented to the user with those found within the AddressBase Premium products. In particular, any query will need to test multiple fields for a given input and will need to combine result sets from the two different address formats (Delivery Point Address and Geographic Address) in order to produce the most complete result set.

Generally, a search form will describe a simplified view of an address in order to keep the user interface tidy and intuitive. Users may be given a set of text boxes to fill in, generally including building name, building number, street name, locality name, town name and postcode. The relationships between some common search fields and the fields found in AddressBase Premium and AddressBase Premium Islands are as follows:

The table below shows the relationships between some common search fields and the fields found in AddressBase Premium and AddressBase Premium Islands.

Search Box
Mapped Delivery Point fields
Mapped Geographic fields

Business Name

Organisation_Name

Organisation PAO_Text SAO_Text

Flat / Subdivision Name

Sub_Building_Name Department_Name

SAO_Text

Flat / Subdivision Number

Sub_Building_Name

SAO_StartNumber SAO_StartSuffix SAO_EndNumber

SAO_EndSuffix

Building Name

Building_Name

PAO_Text

Building Number

Building_Number

Building_Name (in cases where a suffix or range is present)

PAO_StartNumber PAO_StartSuffix PAO_EndNumber

PAO_EndSuffix

Street

Thoroughfare Dependent_Thoroughfare

Street

PAO_Text

Locality

Dependent_Locality Double_Dependent_Locality

Locality

Town Street

Town

Dependent_Locality Post_Town

Town Locality

Postcode

Postcode

Postcode_Locator

The above mapping is an example only and it is possible to break down the search fields differently, in which case a different mapping would be required. The important thing is to consider all possibilities for how data might be recorded. For example, a business name can sometimes appear as an organisation name or a building/PAO name, depending on circumstances, so both must be checked when creating a search query.

Numbers need to be handled very carefully due to the presence of suffixes and ranges. There are two options for structuring the search input in these cases:

  1. A single ‘number’ box can be used (as shown in the table above), which will then require some string manipulation to split the input into the appropriate numeric range and suffix components in order to search the Geographic Addresses; or

  2. Four boxes can be provided for each number (start number, start suffix, end number and end suffix), which would then need to be combined into an appropriate string to search the Delivery Point Addresses.

Structuring the query for a structured address search

The basic rules to follow when generating a search query from structured input are as follows:

  • Ignore any search boxes that are not filled in with values.

  • Where a value is entered, assume that a match on at least one of the mapped fields is essential.

In SQL query terms, this means that each search term should generate a sub-query that searches each of the mapped fields (using OR), and that these sub-queries should then be combined together (using AND) into a single search query. The following SQL code illustrates this (for the Delivery Point Address search only) where a street, locality and town name have been entered by the user:

SELECT dp.UPRN, GetFormattedAddress(dp.*) FROM abp_delivery_point dp
WHERE (dp.thoroughfare = streetsearchtext OR dp.dependent_thoroughfare = streetsearchtext) AND (dp.dependent_locality = localitysearchtext OR dp.double_dependent_locality = localitysearchtext) AND (dp.dependent_locality = townsearchtext OR dp.post_town = townsearchtext)

In the above example, ‘streetsearchtext’, ‘localitysearchtext’, and ‘townsearchtext’ (shown in blue) represent user-entered search terms (which could be parameters within an SQL function) and the GetFormattedAddress(*) function is a hypothetical user-defined function that returns the formatted address as a single string (suitable for display in the user interface). For more information on formatting addresses, please see Creating a single-line or multi-line address.

On top of this, for a complete query the two different types of addresses should be queried separately (Geographic and Delivery Point Addresses), and the two result sets should be amalgamated into a single set using a UNION. The following example builds upon the previous example to include Geographic Addresses as well as Delivery Point Addresses:

SELECT dp.UPRN, GetFormattedAddress(dp.*) FROM abp_delivery_point dp
WHERE (dp.thoroughfare = streetsearchtext OR dp.dependent_thoroughfare = streetsearchtext) AND (dp.dependent_locality = localitysearchtext OR dp.double_dependent_locality = localitysearchtext) AND (dp.dependent_locality = townsearchtext OR dp.post_town = townsearchtext)
UNION
SELECT b.uprn, GetFormattedAddress(b.*, l.*, s.*, o.*) FROM abp.blpu b INNER JOIN abp.lpi l ON l.uprn
= b.uprn
INNER JOIN abp.street_descriptor s ON s.usrn = b.usrn LEFT JOIN abp.organisation o ON o.uprn = b.uprn WHERE
(s.street_name = streetsearchtext OR l.pao_text = streetsearchtext) AND
(s.locality = localitysearchtext OR s.town = localitysearchtext OR s.street_name = localitysearchtext) AND (s.town = townsearchtext OR s.locality = townsearchtext)

The geographic query requires four joins between the BLPU, LPI, Street Descriptor and Organisation tables in order to access all the fields required to build an address.

The SQL UNION operator will combine the two result sets, discarding any exact duplicates (retaining the exact duplicates requires the use of UNION ALL, but that is not desirable in this example).

The resulting output from this query will be a set of search results: formatted addresses along with their UPRN. Exact duplicates will be omitted, but all ‘variations’ of the same address will be outputted (one row for each variation, with the same UPRN repeated more than once potentially). It may be wise to also return the ‘logical status’ and/or ‘postal address flag’ values against each to enable further filtering (that is, to include or exclude historical addresses for example, or to restrict the results to postal addresses only).

Supporting case-insensitive queries and partial matches

A flaw in the above examples is the use of equality operators. In practice, because people do not tend to be consistent with the capitalisation of letters, the SQL ‘LIKE’ operator might work better. Depending on the nature of the application, a ‘%’ wildcard could be appended to the end of each search term to allow only the first few letters of an address component to be entered. For example:

SQL code
Description

dp.post_town LIKE townsearchtext

Case insensitive search in some databases

dp.post_town LIKE (townsearchtext || ‘%’)

Matches post towns that start with the search text

dp.post_town LIKE (‘%’ || townsearchtext || ‘%’)

Matches post towns that contain the search text

Alternatively, if exact matches are required but case sensitivity is not, then the UPPER() or LOWER() SQL functions can be used on each side of the equals sign in comparisons (a solution that should work in all databases):

SQL code
Description

UPPER(dp.post_town) = UPPER(townsearchtext)

Case insensitive equality

Finally, to combine all of the approaches, the following would work for maximum flexibility:

SQL code

UPPER(dp.post_town) LIKE (‘%’ || UPPER(townsearchtext) || ‘%’)

Generating a search query from unstructured user input

When offering a ‘search engine’ style search feature with just a single text box to enter search terms, a wholly different approach is required. No assumptions can be made about the order, format or style of the user input, and the data will need to be ‘indexed’ in a way that facilitates searches of this type.

Creating a search index for addresses

Search engine style searches are likely to require the creation of an additional index/lookup table for addresses. Such a table is likely to consist of just two main columns: a key value (UPRN) and a formatted address string. Additional columns may be required to allow filtering of results (such as the ‘logical status’ values, which would allow the results to be filtered on ‘approved’, ‘provisional’ and ‘historical’ statuses, for example).

The table below shows a possible address index table structure:

UPRN
Address Text
Statuses (multiple fields)

123456789012

4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB

Approved

123456789012

FLAT 4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB

Alternative + PAF

123456789012

4 HIGH STREET WALTHAMSDALE CLOSE BURRIDGE BU27 9UB

Historical

947364758903

ROSE COTTAGE MAIN STREET HAVERSHAM SUDBURY SU45 9TY

Approved + PAF

947364758903

ROSE FARMHOUSE MAIN STREET HAVERSHAM SUDBURY

Historical

Note how the addresses have been formatted as a single text string with a single space between each word (although leaving commas in would do no harm). All forms of each address (both PAF and Geographic, current and historical, approved and alternative) have been added to the index, so there can be several rows with the same UPRN. To speed up complex searching, an appropriate index could be added to the Address Text field, such as a full text search index.

Structuring the query for an unstructured address search

Once a suitable search index is in place, the query itself can be put together. The basic idea is to split the user input into search terms by removing commas, double spaces and other unnecessary whitespace, and then splitting the user input at each single space, as follows:

User input: 4, High Street, Westville, wv17 Capitalised, with commas and double-spaces removed: 4 HIGH STREET WESTVILLE WV17

Split into separate search terms:

  • 4

  • HIGH

  • STREET

  • WESTVILLE

  • WV17

Once the user input has been pre-processed into separate search terms, a query can be generated. The key assumption in this example will be that ALL search terms must be matched against the index table to be considered as a result. This implies a query where each value is matched using an ‘AND’ operator. In order to search the whole index, the ‘LIKE’ operator will need to be used along with a ‘%’ wildcard on either side of the search text. A suitable search query for the above example would be as follows:

SELECT UPRN, AddressText FROM AddressSearchIndex 
WHERE
AddressText LIKE ‘%4%’ AND 
AddressText LIKE ‘%HIGH%’ AND 
AddressText LIKE ‘%STREET%’ AND 
AddressText LIKE ‘%WESTVILLE%’ AND
AddressText LIKE ‘%WV17%’;

This query would return all rows from the index table that contain all of the search terms, along with the appropriate UPRNs.

The table below shows how the index table would be used in the above example to return relevant results:

UPRN
Address Text
Statuses (multiple fields)

894756389092

4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL

Approved + PAF

894756389092

ROSE COTTAGE 4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL

Alternative

894756389092

ROSE COTTAGE HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL

Alternative

274859037849

FLAT 4 HIGHBURY COURT HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL

Approved + PAF

482974769830

MAPS4U LTD HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL

Approved

This result set can then be presented to the user, who can select the most appropriate record, which can then be retrieved in full using the UPRN.

Of course, in a practical implementation, the above query would need to be dynamically generated, with a separate condition added for each search term. This example is quite a strict search query that requires all search terms to be present. Many layers of complexity could be added to allow partial and ‘fuzzy’ matches, and to return confidence scores for example, but such enhancements are beyond the scope of this section.

Summary: Searching for addresses

This section introduces implementing address search functionality using AddressBase Premium products. The main points are summarised below:

  • A user front-end for an address search may contain a single, search engine style text box or multiple text boxes representing different parts of an address.

  • A typical address search function takes place in three stages:

    • A user enters search text.

    • A query is run, returning a set of possible matches.

    • The user selects the address of interest, and the full record is then returned.

  • With a structured search interface, the addresses can be queried directly by mapping the various address fields to the text boxes supplied.

  • For an unstructured (single text box) interface, it is necessary to create an index table with fully formatted address strings against each UPRN. Queries can then be run against this index table by splitting the user input into individual search terms and requiring them all to be present.

  • It is possible to filter results by status, for example, ‘approved’, ‘alternative’ and ‘historical’, as well as ‘postal’ or ‘non- postal’, etc.

  • Any search function should search all forms of an address (both Geographic and Delivery Point Addresses).

  • Careful consideration should be given to the use of ‘fuzzy’ search algorithms (such as using wildcard or sound-alike searches).

Last updated