How to untangle phone numbers
Have you ever noticed how everyone writes phone numbers differently? Some people use spaces, some use dashes, and some use parentheses. Different people group a different number of digits together.
This becomes a real problem when you're trying to store phone numbers in a database and need to retrieve records by phone number.
Jump directly to:
What is a normalized phone number?
In IT and computer science, to normalize means to make something consistent or standard. A normalized phone number is a phone number that is consistently formatted across all records.
Is there an international standard for phone numbers? The International Telecommunication Union (ITU) has published the E.123 standard which defines standard notations for phone numbers, email addresses, and web addresses.
An international phone number written in the E.123 format looks like this:
+12 345 678 901
The + sign indicates the international dialing prefix. The country code is 12 , and the phone number is 345 678 901 .
But when asked to enter their phone number in a form field, most people would write it in their local convention or even personal preference. And these can vary widely from country to country and even from person to person.
Phone number formats are a mess
For example, in the US, the same phone number would be written as:
(345) 678-901
In the UK, a local phone number would be written as:
01234 56 7890
And in India, it would be written as:
0123-456-7890
And some people might use the international access code 00 instead of the + sign.
0012 345 678 901
Why is this a problem?
When you're storing phone numbers in a database, you want to be able to search for them regardless of how they were entered.
Imagine you have the following phone numbers in your database:
123-456-7890
(234) 567-8901
0345 / 678 9012
+456 789 0123
567-890-1234
If you get a call from +12 345 678 9012 , how will you find it in your database to look up the customer's details?
How to normalize a phone number
The least ambiguous format to store a phone number is the international format defined by E.123 without any spaces: +123456789012 .
But how do you get there?
Via the user interface
The very best way is to try to get your users to enter their phone numbers in this format in the first place.
This could be done by providing a dropdown list of countries and then automatically formatting the phone number based on the country code.
For example the International Telephone Input library offers this nice UI:
Source: https://github.com/jackocnr/intl-tel-input
Handling existing data
If you can't get your users to enter their phone numbers in the correct format because you're dealing with legacy data or you're importing data from another source, your journey will be harder - but not impossible.
First of all, consider using a library. Google has open-sourced a library called libphonenumber that can help you in many cases.
If you're using Python, take a look at the Python port of libphonenumber called phonenumbers.
Or for JavaScript google-libphonenumber.
But sometimes you'll need to write your own normalization function or handle special cases the library can't handle because it doesn't have enough context.
I'll focus on the vast majority of cases where your main enemies are inconsistencies in the way phone numbers are written. For the weirdest local edge cases, take a look at the phone number philosophy section at the end of this article.
Here are the steps you'll need to take:
Strip some non-numeric characters
Be careful stripping non-numeric characters as your first step. Some can help you determine the area code.
The way someone writes a phone number can give you hints about the country and area codes. For example, in the US, the area code is often enclosed in parentheses.
(234) 567-8901
Or sometimes the area code is separated by a space or a slash.
0345 / 678 9012
01234 56 7890
Take these structures into account and save your first guess about the country or area code for later.
Determine the country code
If you're only dealing with phone numbers from one country, you can assume that all phone numbers either already have the country code in the format +1 , +12 or +123 , or that they are local numbers.
People might also have entered the country code without the + sign, like 1 or 12 .
If a local number starts with a single 0 , strip the 0 and prepend the country code. If it starts with 00 , strip the 00 but assume the country code is already there.
01234 56 7890 becomes +441234 56 7890
becomes 0044 1234 56 7890 becomes +441234 56 7890
If you're dealing with phone numbers from multiple countries, you'll need to know which country the user was assuming when they entered the phone number.
If you are lucky and only need to handle a few countries, area codes can give you a hint about the country: Some area codes are used in one country but not in another.
Only keep the digits and + sign
As the final step, strip all non-numeric characters except the + sign at the beginning.
Your final normalized phone number should look like this:
+123456789012
And now you're ready to store it as a string.
When you have to search for a phone number, normalize the search term in the same way and search.
To be safe, store the original phone number as well. You might need it later for verification or to display it to the user.
Phone number philosophy
I have to admit something: I made it sound more straightforward than it is. And while the rest of this article should cover most cases you'll ever encounter, reality is way messier.
For a deeper dive into the topic, I recommend reading the article Falsehoods Programmers Believe About Phone Numbers which is part of the libphonenumber repository.
Here are the most important aspects that could bite you if you're not careful:
Phone numbers are not always unique : People can have multiple phone numbers, one phone number can be shared by many people.
: People can have multiple phone numbers, one phone number can be shared by many people. Phone numbers can change : People can change their phone numbers and phone numbers can be reassigned.
: People can change their phone numbers and phone numbers can be reassigned. Not all numbers are dialable or textable
Non-ASCII characters : For example, in Egypt, the Arabic script is often used to write phone numbers.
: For example, in Egypt, the Arabic script is often used to write phone numbers. Numbering plans change : Countries can change their numbering plans and there may be a transition period during which both formats are valid.
: Countries can change their numbering plans and there may be a transition period during which both formats are valid. Phone numbers are no numbers: Phone numbers are not numbers in the mathematical sense. 7 is not the same phone number as 007 .
Final thoughts
Storing phone numbers only as they were entered almost guarantees that you'll have trouble finding them later. Normalizing phone numbers early is a good idea if you want to be able to search for them later on.
Here is a reminder of the key points in this article: