Removing a part of Google Spreadsheet data to fix the data types setting

Published in

learn data science

3 min readNov 3, 2016

Importing spreadsheet data like CSV, Excel, Google Spreadsheet, etc, into Exploratory Desktop is super easy. But, since it’s so easy to enter anything into those spreadsheets, there are some challenges with the data sometimes. One of the typical challenges is the data that include some metadata information.

Take a look at the picture below.

As you can see, the 1st row after the header row is for Japanese translation for the column header names. You can quickly remove this kind of row by using ‘slice’ command. But the problem is, when you import data like this, data types for the columns don’t get properly recognized based on the actual data because of those characters in the 1st row, which makes all the columns to be considered as ‘character’ type.

Luckily, we can fix this problem super easily with just a few steps.

Remove Row and Re-Evaluate

Here, I have imported the above sample data that is shown in Summary view. As you can see, all the columns are now registered as ‘character’.

When we go to Table view, we can see the 1st row contains the ‘character’ data as we have seen in the data import dialog above.

Remove 1st row

First, we can quickly remove the 1st row by using ‘slice’ command.

You can remove a range of the rows or keep only the range of the rows. In this case, we want to remove just the 1st row.

Re-evaluate the data type

Once we have removed the row, then we can use ‘type_convert’ command to have R evaluate the data for all the columns and re-register them with the most appropriate data types.

After selecting ‘type_convert’ command, you can see all the columns are properly registered as something like below. Now you see histograms for ‘numeric’ columns.

Once you get the data types right, then it would be much easier to visualize the data.

That’s all! ✨

If you have other data wrangling challenges, please post them to our Exploratory Community page.

R packages used in this post

jennybc/googlesheets

googlesheets - Google Spreadsheets R API

github.com

hadley/dplyr

dplyr - Dplyr: A grammar of data manipulation

github.com

tidyverse/readr

readr - Read flat files (csv, tsv, fwf) into R

github.com

Reproduce in a standalone R environment

# Set libPaths.
.libPaths("/Users/kannishida/.exploratory/R/3.3")

# Load required packages.
library(readr)
library(dplyr)
library(exploratory)

# Set path for OAuth token.
 options(tam.oauth_token_cache = "/Users/kannishida/.exploratory/projects/8-23_cde4d6e946c2/rdata/placeholder.rds")

# Data Analysis Steps
exploratory::select_columns(exploratory::clean_data_frame(exploratory::getGoogleSheet('SFO Passengers Test',1,0,c(),TRUE,'','3a2c23dde6f7')),"ID","YEAR","MONTH","OPERATING_AIRLINE","OPERATING_AIRLINE_IATA_CODE","GEO_SUMMARY","GEO_REGION","CURRENT_COUNT","PREVIOUS_COUNT","CHANGE_COUNT") %>%
  exploratory::clean_data_frame() %>%
  slice(-1) %>%
  type_convert()

Removing a part of Google Spreadsheet data to fix the data types setting

Remove Row and Re-Evaluate

Remove 1st row

Re-evaluate the data type

R packages used in this post

jennybc/googlesheets

googlesheets - Google Spreadsheets R API

hadley/dplyr

dplyr - Dplyr: A grammar of data manipulation

tidyverse/readr

readr - Read flat files (csv, tsv, fwf) into R

Reproduce in a standalone R environment

Written by Kan Nishida