Searching for addresses
Last updated
Last updated
A common requirement for customers using the AddressBase products is to search for properties using full or partial addresses. Address searches may return a large number of addresses, a short list of possibilities, a single match or no results, depending on the search criteria.
There are many methods of implementing an address search, from free text queries through to structured address component searches. This section will step through two such approaches that may be used when working with AddressBase Premium products: free text search and structured component search.
These methods are not intended as recommendations; they are simply examples of how to get maximum value out of the products when implementing an address search function.
One type of search implementation involves a single ‘search engine’ style text box, into which a user can type all or some of an address. For example:
CLOVER AVENUE, SW99 9ZZ
1, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
2, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
3, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
4, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
5, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
6, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
7, Clover Avenue, Fieldtown, Addressville, SW99 9ZZ
In this scenario, the user can choose to type anything in Find address, which may be just one component of an address (for example, a postcode, street name or building name), several parts of an address (for example, street name + town name, house name + postcode, etc.) or even (rarely) a complete address.
There may or may not be commas between search items, or they may have been entered with or without capitalised letters, etc. In short, with this search method, there is no structure to the user input and the search methodology must be designed with this in mind.
The other common type of implementation for address searches involves entering search criteria in a structured way (for example, with a different text box for each major address component).
Rose Cottage, Main Street, Fieldtown, Addressville, SW99 9ZZ
Rose Cottage, Main Street, Ashford, AS45 9PP
Rose Cottage, Main Street, Buxtew, Monley, MO88 4TY
And so on...
This method guides the user to enter known components of an address and also creates a predictable user input structure around which to build a search function. While generally simpler to use and implement, it can be less user-friendly, particularly in cases where it is not obvious which box to type an address component into; for example, is Richmond Terrace a building name or a street?
The following sub-sections suggest how to implement the two search methods described above. Both methods should be used alongside the instructions on formatting single address labels given in Section 11.
As described in Creating a single-line or multi-line address above, at a high level, the AddressBase Premium products provide two different types of address: the Delivery Point Address and the Geographic Address. However, for some Geographic Addresses, an alternative, provisional or historical variant of the approved record may also be provided as well as the approved address (all sharing the same UPRN).
The table below outlines what these addresses are and how to access them in the products. It provides a breakdown of the location and definition of Delivery Point addresses and the four categories of Geographic Addresses available in AddressBase Premium products.
Delivery Point Address
The postal address as assigned to the property by Royal Mail (and widely used by the public).
Delivery Point Address table.
Approved Geographic Address
The legal / approved address as assigned by the local naming and numbering authority.
LPI table with Logical Status = 1, joined to Street Descriptor, Organisation and BLPU tables.
Provisional Geographic Address
Provisional addresses may exist for a property from the moment that an address has been granted planning permission to be built to the time when construction has been completed.
LPI table with Logical Status = 6, joined to Street Descriptor, Organisation and BLPU tables.
Alternative Geographic Address
Any alternative addresses that may exist for this property (for example, alternative names). There may be more than one alternative address per property.
LPI table with Logical Status = 3, joined to Street Descriptor, Organisation and BLPU tables.
Historical Geographic Address
Any historical addresses (recorded since data collection began) that may have existed in the past for this property (for example, previous house names or business names, and so on). There may be more than one historical address per property.
LPI table with Logical Status = 8, joined to Street Descriptor, Organisation and BLPU tables.
An address search operation typically requires two stages of interaction from a user, and several processing steps from the underlying IT system. These steps are summarised in the following diagram:
The second user interaction can be omitted if there is only one result returned from the query. In almost all cases, there should be an option to ‘search again’ at the second and third stages in case no results are returned, or if none of the options shown is the required address.
Of course, different applications require different approaches; however, the general principles of the above process apply in all cases where an address is searched for based on user-entered criteria.
Within an interface that accepts structured user input for an address search, it is necessary to ‘map’ the fields presented to the user with those found within the AddressBase Premium products. In particular, any query will need to test multiple fields for a given input and will need to combine result sets from the two different address formats (Delivery Point Address and Geographic Address) in order to produce the most complete result set.
Generally, a search form will describe a simplified view of an address in order to keep the user interface tidy and intuitive. Users may be given a set of text boxes to fill in, generally including building name, building number, street name, locality name, town name and postcode. The relationships between some common search fields and the fields found in AddressBase Premium and AddressBase Premium Islands are as follows:
The table below shows the relationships between some common search fields and the fields found in AddressBase Premium and AddressBase Premium Islands.
Business Name
Organisation_Name
Organisation PAO_Text SAO_Text
Flat / Subdivision Name
Sub_Building_Name Department_Name
SAO_Text
Flat / Subdivision Number
Sub_Building_Name
SAO_StartNumber SAO_StartSuffix SAO_EndNumber
SAO_EndSuffix
Building Name
Building_Name
PAO_Text
Building Number
Building_Number
Building_Name (in cases where a suffix or range is present)
PAO_StartNumber PAO_StartSuffix PAO_EndNumber
PAO_EndSuffix
Street
Thoroughfare Dependent_Thoroughfare
Street
PAO_Text
Locality
Dependent_Locality Double_Dependent_Locality
Locality
Town Street
Town
Dependent_Locality Post_Town
Town Locality
Postcode
Postcode
Postcode_Locator
The above mapping is an example only and it is possible to break down the search fields differently, in which case a different mapping would be required. The important thing is to consider all possibilities for how data might be recorded. For example, a business name can sometimes appear as an organisation name or a building/PAO name, depending on circumstances, so both must be checked when creating a search query.
Numbers need to be handled very carefully due to the presence of suffixes and ranges. There are two options for structuring the search input in these cases:
A single ‘number’ box can be used (as shown in the table above), which will then require some string manipulation to split the input into the appropriate numeric range and suffix components in order to search the Geographic Addresses; or
Four boxes can be provided for each number (start number, start suffix, end number and end suffix), which would then need to be combined into an appropriate string to search the Delivery Point Addresses.
The basic rules to follow when generating a search query from structured input are as follows:
Ignore any search boxes that are not filled in with values.
Where a value is entered, assume that a match on at least one of the mapped fields is essential.
In SQL query terms, this means that each search term should generate a sub-query that searches each of the mapped fields (using OR), and that these sub-queries should then be combined together (using AND
) into a single search query. The following SQL code illustrates this (for the Delivery Point Address search only) where a street, locality and town name have been entered by the user:
In the above example, ‘streetsearchtext’, ‘localitysearchtext’, and ‘townsearchtext’ (shown in blue) represent user-entered search terms (which could be parameters within an SQL function) and the GetFormattedAddress(*) function is a hypothetical user-defined function that returns the formatted address as a single string (suitable for display in the user interface). For more information on formatting addresses, please see Creating a single-line or multi-line address.
On top of this, for a complete query the two different types of addresses should be queried separately (Geographic and Delivery Point Addresses), and the two result sets should be amalgamated into a single set using a UNION. The following example builds upon the previous example to include Geographic Addresses as well as Delivery Point Addresses:
The geographic query requires four joins between the BLPU, LPI, Street Descriptor and Organisation tables in order to access all the fields required to build an address.
The SQL UNION operator will combine the two result sets, discarding any exact duplicates (retaining the exact duplicates requires the use of UNION ALL, but that is not desirable in this example).
The resulting output from this query will be a set of search results: formatted addresses along with their UPRN. Exact duplicates will be omitted, but all ‘variations’ of the same address will be outputted (one row for each variation, with the same UPRN repeated more than once potentially). It may be wise to also return the ‘logical status’ and/or ‘postal address flag’ values against each to enable further filtering (that is, to include or exclude historical addresses for example, or to restrict the results to postal addresses only).
A flaw in the above examples is the use of equality operators. In practice, because people do not tend to be consistent with the capitalisation of letters, the SQL ‘LIKE’ operator might work better. Depending on the nature of the application, a ‘%’ wildcard could be appended to the end of each search term to allow only the first few letters of an address component to be entered. For example:
dp.post_town LIKE townsearchtext
Case insensitive search in some databases
dp.post_town LIKE (townsearchtext || ‘%’)
Matches post towns that start with the search text
dp.post_town LIKE (‘%’ || townsearchtext || ‘%’)
Matches post towns that contain the search text
Alternatively, if exact matches are required but case sensitivity is not, then the UPPER() or LOWER() SQL functions can be used on each side of the equals sign in comparisons (a solution that should work in all databases):
UPPER(dp.post_town) = UPPER(townsearchtext)
Case insensitive equality
Finally, to combine all of the approaches, the following would work for maximum flexibility:
UPPER(dp.post_town) LIKE (‘%’ || UPPER(townsearchtext) || ‘%’)
When offering a ‘search engine’ style search feature with just a single text box to enter search terms, a wholly different approach is required. No assumptions can be made about the order, format or style of the user input, and the data will need to be ‘indexed’ in a way that facilitates searches of this type.
Search engine style searches are likely to require the creation of an additional index/lookup table for addresses. Such a table is likely to consist of just two main columns: a key value (UPRN) and a formatted address string. Additional columns may be required to allow filtering of results (such as the ‘logical status’ values, which would allow the results to be filtered on ‘approved’, ‘provisional’ and ‘historical’ statuses, for example).
The table below shows a possible address index table structure:
123456789012
4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB
Approved
123456789012
FLAT 4 THE MEADOWS HIGH STREET WALTHAMSDALE BURRIDGE BU27 9UB
Alternative + PAF
123456789012
4 HIGH STREET WALTHAMSDALE CLOSE BURRIDGE BU27 9UB
Historical
947364758903
ROSE COTTAGE MAIN STREET HAVERSHAM SUDBURY SU45 9TY
Approved + PAF
947364758903
ROSE FARMHOUSE MAIN STREET HAVERSHAM SUDBURY
Historical
Note how the addresses have been formatted as a single text string with a single space between each word (although leaving commas in would do no harm). All forms of each address (both PAF and Geographic, current and historical, approved and alternative) have been added to the index, so there can be several rows with the same UPRN. To speed up complex searching, an appropriate index could be added to the Address Text field, such as a full text search index.
Once a suitable search index is in place, the query itself can be put together. The basic idea is to split the user input into search terms by removing commas, double spaces and other unnecessary whitespace, and then splitting the user input at each single space, as follows:
User input: 4, High Street, Westville, wv17 Capitalised, with commas and double-spaces removed: 4 HIGH STREET WESTVILLE WV17
Split into separate search terms:
4
HIGH
STREET
WESTVILLE
WV17
Once the user input has been pre-processed into separate search terms, a query can be generated. The key assumption in this example will be that ALL search terms must be matched against the index table to be considered as a result. This implies a query where each value is matched using an ‘AND’ operator. In order to search the whole index, the ‘LIKE’ operator will need to be used along with a ‘%’ wildcard on either side of the search text. A suitable search query for the above example would be as follows:
This query would return all rows from the index table that contain all of the search terms, along with the appropriate UPRNs.
The table below shows how the index table would be used in the above example to return relevant results:
894756389092
4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Approved + PAF
894756389092
ROSE COTTAGE 4 HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Alternative
894756389092
ROSE COTTAGE HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Alternative
274859037849
FLAT 4 HIGHBURY COURT HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Approved + PAF
482974769830
MAPS4U LTD HIGH STREET WESTVILLE SUNNYTOWN WV17 7HL
Approved
This result set can then be presented to the user, who can select the most appropriate record, which can then be retrieved in full using the UPRN.
Of course, in a practical implementation, the above query would need to be dynamically generated, with a separate condition added for each search term. This example is quite a strict search query that requires all search terms to be present. Many layers of complexity could be added to allow partial and ‘fuzzy’ matches, and to return confidence scores for example, but such enhancements are beyond the scope of this section.
This section introduces implementing address search functionality using AddressBase Premium products. The main points are summarised below:
A user front-end for an address search may contain a single, search engine style text box or multiple text boxes representing different parts of an address.
A typical address search function takes place in three stages:
A user enters search text.
A query is run, returning a set of possible matches.
The user selects the address of interest, and the full record is then returned.
With a structured search interface, the addresses can be queried directly by mapping the various address fields to the text boxes supplied.
For an unstructured (single text box) interface, it is necessary to create an index table with fully formatted address strings against each UPRN. Queries can then be run against this index table by splitting the user input into individual search terms and requiring them all to be present.
It is possible to filter results by status, for example, ‘approved’, ‘alternative’ and ‘historical’, as well as ‘postal’ or ‘non- postal’, etc.
Any search function should search all forms of an address (both Geographic and Delivery Point Addresses).
Careful consideration should be given to the use of ‘fuzzy’ search algorithms (such as using wildcard or sound-alike searches).