Removing a part of Google Spreadsheet data to fix the data types setting

Kan Nishida
learn data science
Published in
3 min readNov 3, 2016

--

Importing spreadsheet data like CSV, Excel, Google Spreadsheet, etc, into Exploratory Desktop is super easy. But, since it’s so easy to enter anything into those spreadsheets, there are some challenges with the data sometimes. One of the typical challenges is the data that include some metadata information.

Take a look at the picture below.

As you can see, the 1st row after the header row is for Japanese translation for the column header names. You can quickly remove this kind of row by using ‘slice’ command. But the problem is, when you import data like this, data types for the columns don’t get properly recognized based on the actual data because of those characters in the 1st row, which makes all the columns to be considered as ‘character’ type.

Luckily, we can fix this problem super easily with just a few steps.

Remove Row and Re-Evaluate

Here, I have imported the above sample data that is shown in Summary view. As you can see, all the columns are now registered as ‘character’.

When we go to Table view, we can see the 1st row contains the ‘character’ data as we have seen in the data import dialog above.

Remove 1st row

First, we can quickly remove the 1st row by using ‘slice’ command.

You can remove a range of the rows or keep only the range of the rows. In this case, we want to remove just the 1st row.

Re-evaluate the data type

Once we have removed the row, then we can use ‘type_convert’ command to have R evaluate the data for all the columns and re-register them with the most appropriate data types.

After selecting ‘type_convert’ command, you can see all the columns are properly registered as something like below. Now you see histograms for ‘numeric’ columns.

Once you get the data types right, then it would be much easier to visualize the data.

That’s all! ✨

If you have other data wrangling challenges, please post them to our Exploratory Community page.

R packages used in this post

Reproduce in a standalone R environment

# Set libPaths.
.libPaths("/Users/kannishida/.exploratory/R/3.3")

# Load required packages.
library(readr)
library(dplyr)
library(exploratory)

# Set path for OAuth token.
options(tam.oauth_token_cache = "/Users/kannishida/.exploratory/projects/8-23_cde4d6e946c2/rdata/placeholder.rds")

# Data Analysis Steps
exploratory::select_columns(exploratory::clean_data_frame(exploratory::getGoogleSheet('SFO Passengers Test',1,0,c(),TRUE,'','3a2c23dde6f7')),"ID","YEAR","MONTH","OPERATING_AIRLINE","OPERATING_AIRLINE_IATA_CODE","GEO_SUMMARY","GEO_REGION","CURRENT_COUNT","PREVIOUS_COUNT","CHANGE_COUNT") %>%
exploratory::clean_data_frame() %>%
slice(-1) %>%
type_convert()

--

--

CEO / Founder at Exploratory(https://exploratory.io/). Having fun analyzing interesting data and learning something new everyday.