
Geocoding US Address Data with zipcode Package & Visualize it
Amazing things about R is that there are more than 13,000 packages (as of writing on 2/8/2019) that are available at the official repository called CRAN (The Comprehensive R Archive Network) and a lot more at other repositories like Github.
Among many things those R packages do ranging from data wrangling, visualization, modeling, etc., some packages include data that is super useful.
Today, I want to introduce this package called ‘zipcode’ from Jeffrey Breen that provides literally US zip code related data including zip code, city name, state name, longitude, and latitude.
This is super useful when you have US address data that contains zip code and want to obtain geocodes (longitude and latitude).
You can import US zip code data from this package and join it with your data mapping by the zip code.
Yes, by using the zipcode you can ‘geocode’ your US address data and visualize with Map like the below!

This could be a good enough ‘geocoding’ solution especially when you don’t want to do the costly and slow geocoding operations using the third party web services like Google’s geocode API, etc.
Importing data from R package is actually super simple in Exploratory.
Let’s take a look how to import the data and use it to visualize US address data.
To demonstrate, I’m going to use this US hospital rating data from The Centers for Medicare & Medicaid Services, for which I want to geocode all the hospital locations.
Install zipcode R package
First, let’s install ‘zipcode’ R package.
Select ‘Manage R Packages’ from the project dropdown menu.

Type ‘zipcode’ and click ‘Install’ button under ‘Install New Packages’ tab.

Make sure that ‘zipcode’ package is installed and showing up under ‘Installed Package’ tab.

Import zipcode data
Let’s import data from the ‘zipcode’ package.
Select ‘R script’ under Data Frames dropdown menu.

Type the following in the code editor area.
library(zipcode)
data(zipcode)
zipcode

If you are not familiar with R, the first line is loading the ‘zipcode’ package into the current R session. The second line is using ‘data’ function to extract the ‘zipcode’ data from the package as a data frame called ‘zipcode’. The last line is calling the data frame to return the data.
If you want to know more about ‘zipcode’ package functionality, take a look the reference doc for ‘zipcode’ package.
Click ‘Run’ button to get the data and click ‘Save button to create a data frame inside Exploratory.
Now you have the zip code data imported from ‘zipcode’ R package.

It’s that simple!
Join Hospital Data with Zip Code Data
Here is the hospital data that has a list of the hospitals that are surveyed for the service quality by the patients.

This data can be downloaded from here.
And we want to visualize the hospital locations on Map.
There are a few data problems I need to address before.
First, there are multiple rows per hospital because there are different survey questions/answers for each hospital. I want to keep only the unique row, one hospital per row.

Second, the zip code column in the hospital data is numeric data type and this is a problem.

US zip code is always 5 digits. For example, the above case, the zip codes for the hospitals in Massachusetts have only 4 digits but these should be filled with zero at the biginning. So 1040 should be 01040.
We can take two steps to address this problem. First, we’ll convert this to Character data type, then fill with zero at the biginning.
Third, we want to join this data with the zip code data that we previously imported from ‘zipcode’ package.
1. Keep Only Unique Hospital Data
There is a hospital id column and we can use this to keep only the unique hospital rows.

This will remove all the duplicated rows and keep only the unique rows based on the hospital id.

2. Convert ‘zip’ column to Character Type & Fill with 0
Convert to Character Type
We can change the data type of Zip Code column by selecting
Change Data Type -> Convert to Character
from the column header menu.

Notice that the data type for ZIP Code column indicates as Character type.

Fill (Pad) 0 at the beginning
We can select
Work with Text -> Pad Text
from the column header menu.

This will bring up Mutate dialog with ‘str_pad’ function being pre-populated.
We can make it to be something like this.
str_pad(`ZIP Code`, pad="0", side="left", width=5)
Here’s how it looks in the dialog.

Once you run it, we can see that ‘0’ is filled at the left hand side of each text to make all the zipcode values to have 5 digits width.

Now it’s ready to join with the zip code data!
3. Join with the zip code data
We’ll use ‘ZIP Code’ column as the join key to join with the zip code data frame.
Select ‘Join (Add Columns)’ from the column header menu.

In the dialog, select the data frame that has the zip code and longitude/latitude information.
In my case, that is ‘zipcode_data’, and select ‘zip’ column as the key column of the target data frame.

Once that’s done, you’ll see the new column being added at the end.

The columns with the orange bar at the top are the ones from the zip code data frame.
Visualize Geocoded Hospital Data with Map
We can quickly visualize the hospital locations by using the longitude and latitude columns with Map.
Under the Chart view, select Map — Long/Lat as the chart type.

You can zoom in if you like.

That’s it!
Try Exploratory!
If you don’t have Exploratory Desktop you can sign up from the website!
R Packages used in this post