Skip to content

Address Structure, Formatting and Parsing

Reference

Consider the address “25 Lower Way, Thatcham RG19 3RR”. For a computer this is just a meaningless string, and can easily be confused with other similar strings. However, consider the following JSON

{
"street_number": "25",
"street_name": "Lower Way",
"city": "Thatcham",
"postalcode": "RG19 3RR"
}

This is much easier and faster to search. Therefore, we now offer a second way to include an address, by using the address_structured field. Which can take the following JSON

{
"unit": Option<String>,
"house_name": Option<String>,
"street_number": Option<String>,
"street_name": Option<String>,
"city": Option<String>,
"county": Option<String>,
"state": Option<String>,
"country": Option<String>,
"postalcode": Option<String>
}

Let’s also look at some examples of what kind of data goes into which field.

  • Unit: a flat, apartment, suite or unit number e.g. “flat 6”, “building C”, “unit 2”
  • House name: a named building e.g. “Buckingham Palace”, “Pearl Court”, “Falmer House”
  • Street Number: the number on the street e.g. “2”, “25”, “1015”
  • Street Name: the name of the street e.g. “Yew Court Road”, “3rd Ave”, “Mount Rosie Rd”
  • City: the name of the city, or town, village, hamlet e.g. “London”, “Bradpole”, “Catskill”
  • County: county name e.g. “Sussex”, “Berkshire”, “Mid Glamorgan”
  • State: state name e.g. “Texas”, “California”, “Maine”
  • Country: country e.g. “USA”, “UK”, “Singapore”
  • Postalcode: postcode or Zip code e.g. “BN21 4LD”, “128130”, “33607”

Some notes:

  • County is not commonly used in USA addresses, but very common in UK addresses. In the USA this would be counties like “Alameda”, “Miami Dade”, “Cook”. Again, rarely used in USA addresses
  • State, in the USA is the states, which are common on addresses (Texas, New York and so on). In the UK, state refers to the constituent nations i.e. “England”, “Scotland”, “Wales”, “Northern Ireland”. Not commonly used in UK addresses
  • Country is unnecessary in the structured address if the country tag is used in the base request. In the base request, country can only take official two digit country codes i.e. UK, SG, US. In the structured address, it can take any arbitrary country string e.g. “United Kingdom”, “United States of America”, “America”, “United States” and so on.
  • Postalcode is known as Zip code in the USA. You can use a standard Zip code e.g. “80236”, or a Zip+4 code e.g. “80236-2345”

Let’s finally look at some examples of what some correctly parsed examples would look like

In the UK a typical address could be

123 Valley Way
Stevenage
Hertfordshire
SG2 9DE

Which would be correctly parsed as

{
"street_number": "123",
"street_name": "Valley Way",
"city": "Stevenage",
"county": "Hertfordshire",
"postalcode": "SG2 9DE"
}

In the USA a typical address could be

143-10 94th Ave
Jamaica, NY 11435

Which would be correctly parsed as

{
"street_number": "143-10",
"street_name": "94th Ave",
"city": "Jamaica",
"state": "NY",
"postalcode": "11435"
}

Note how street number is a string not a number, as many street numbers are not numbers! Also, in the USA you may use full state names or abbreviations e.g. Texas or TX.

In Singapore a typical address might be

Blk 123 Bishan St 12 #04-56
Singapore 570123

Which would be correctly parsed as

{
"unit": "#04-56",
"street_number": "Blk 123",
"street_name": "Bishan St 12",
"city": "Singapore",
"postalcode": "570123"
}

Singapore could also go in the country field here, the search would work it out either way. You can also always use SG in the root country field to speed up search even more, if you know the address is in Singapore!

You might not have the address already parsed, and if you do, it might not follow the Naurt standard described above. This is not really a concern anymore, as Naurt boasts our own address parser which can automatically convert address strings into our own format. Therefore, we no longer recommend that you do a parsed geocode yourself, unless you are absolutely sure your data conforms to the Naurt standard.

You can read much more about our parser, its performance and the impact on search here

Formatting is the opposite process of parsing. You will likely require addresses to be formatted in a clean and standard way. Different countries require addresses formatted in slightly different ways, and Naurt has already handled this for you.

As an example of why this matters, consider the address “489 Broome St, Unit 493, New York, NY 10013, United States”. This would be parsed as

{
"city": "New York",
"country": "United States",
"postalcode": "10013",
"state": "NY",
"street_name": "Broome St",
"street_number": "489",
"unit": "UNIT 493"
}

Whereas in the UK, “Flat A, 49 Upper Tulse Hill, London, SW2 2SQ, United Kingdom” has the structure

{
"city": "London",
"country": "United Kingdom",
"postalcode": "SW2 2SQ",
"state": "England",
"street_name": "Upper Tulse Hill",
"street_number": "49",
"unit": "Flat A"
}

These two have the same underlying structure, but have different formats as single address strings. For example, in the USA units come after streets, but in the UK units (usually flat numbers) come first. In the UK, the state (in this case, England) does not appear in the address string, but is in the structure.