Address Structure, Formatting and Parsing
The Structure of Addresses
Section titled “The Structure of Addresses”Consider the address “25 Lower Way, Thatcham RG19 3RR”. For a computer this is just a meaningless string, and can easily be confused with other similar strings. However, consider the following JSON
{ "street_number": "25", "street_name": "Lower Way", "city": "Thatcham", "postalcode": "RG19 3RR"}This is much easier and faster to search. Therefore, we now offer a second way
to include an address, by using the address_structured field. Which can take
the following JSON
{ "unit": Option<String>, "house_name": Option<String>, "street_number": Option<String>, "street_name": Option<String>, "city": Option<String>, "county": Option<String>, "state": Option<String>, "country": Option<String>, "postalcode": Option<String>}Let’s also look at some examples of what kind of data goes into which field.
- Unit: a flat, apartment, suite or unit number e.g. “flat 6”, “building C”, “unit 2”
- House name: a named building e.g. “Buckingham Palace”, “Pearl Court”, “Falmer House”
- Street Number: the number on the street e.g. “2”, “25”, “1015”
- Street Name: the name of the street e.g. “Yew Court Road”, “3rd Ave”, “Mount Rosie Rd”
- City: the name of the city, or town, village, hamlet e.g. “London”, “Bradpole”, “Catskill”
- County: county name e.g. “Sussex”, “Berkshire”, “Mid Glamorgan”
- State: state name e.g. “Texas”, “California”, “Maine”
- Country: country e.g. “USA”, “UK”, “Singapore”
- Postalcode: postcode or Zip code e.g. “BN21 4LD”, “128130”, “33607”
Some notes:
- County is not commonly used in USA addresses, but very common in UK addresses. In the USA this would be counties like “Alameda”, “Miami Dade”, “Cook”. Again, rarely used in USA addresses
- State, in the USA is the states, which are common on addresses (Texas, New York and so on). In the UK, state refers to the constituent nations i.e. “England”, “Scotland”, “Wales”, “Northern Ireland”. Not commonly used in UK addresses
- Country is unnecessary in the structured address if the
countrytag is used in the base request. In the base request,countrycan only take official two digit country codes i.e. UK, SG, US. In the structured address, it can take any arbitrary country string e.g. “United Kingdom”, “United States of America”, “America”, “United States” and so on. - Postalcode is known as Zip code in the USA. You can use a standard Zip code e.g. “80236”, or a Zip+4 code e.g. “80236-2345”
Let’s finally look at some examples of what some correctly parsed examples would look like
In the UK a typical address could be
123 Valley WayStevenageHertfordshireSG2 9DEWhich would be correctly parsed as
{ "street_number": "123", "street_name": "Valley Way", "city": "Stevenage", "county": "Hertfordshire", "postalcode": "SG2 9DE"}In the USA a typical address could be
143-10 94th AveJamaica, NY 11435Which would be correctly parsed as
{ "street_number": "143-10", "street_name": "94th Ave", "city": "Jamaica", "state": "NY", "postalcode": "11435"}Note how street number is a string not a number, as many street numbers are not numbers! Also, in the USA you may use full state names or abbreviations e.g. Texas or TX.
In Singapore a typical address might be
Blk 123 Bishan St 12 #04-56Singapore 570123Which would be correctly parsed as
{ "unit": "#04-56", "street_number": "Blk 123", "street_name": "Bishan St 12", "city": "Singapore", "postalcode": "570123"}Singapore could also go in the country field here, the search would work it out either way. You can also always use SG in the root country field to speed up search even more, if you know the address is in Singapore!
Parsing Addresses
Section titled “Parsing Addresses”You might not have the address already parsed, and if you do, it might not follow the Naurt standard described above. This is not really a concern anymore, as Naurt boasts our own address parser which can automatically convert address strings into our own format. Therefore, we no longer recommend that you do a parsed geocode yourself, unless you are absolutely sure your data conforms to the Naurt standard.
You can read much more about our parser, its performance and the impact on search here
Formatting Addresses
Section titled “Formatting Addresses”Formatting is the opposite process of parsing. You will likely require addresses to be formatted in a clean and standard way. Different countries require addresses formatted in slightly different ways, and Naurt has already handled this for you.
As an example of why this matters, consider the address “489 Broome St, Unit 493, New York, NY 10013, United States”. This would be parsed as
{ "city": "New York", "country": "United States", "postalcode": "10013", "state": "NY", "street_name": "Broome St", "street_number": "489", "unit": "UNIT 493"}Whereas in the UK, “Flat A, 49 Upper Tulse Hill, London, SW2 2SQ, United Kingdom” has the structure
{ "city": "London", "country": "United Kingdom", "postalcode": "SW2 2SQ", "state": "England", "street_name": "Upper Tulse Hill", "street_number": "49", "unit": "Flat A"}These two have the same underlying structure, but have different formats as single address strings. For example, in the USA units come after streets, but in the UK units (usually flat numbers) come first. In the UK, the state (in this case, England) does not appear in the address string, but is in the structure.