Quantitative Financial Analysis For Beginners with Exploratory.io and tidyquant

Matthew Dancho
learn data science
Published in
8 min readMar 11, 2017

--

Learning how to perform quantitative financial analysis can be a daunting task. There’s a lot to learn and your time is precious. In this article, we’ll learn how to perform a basic analysis of stock returns to understand the fundamentals of quantitative stock analysis using two tools for exploratory data analysis and visualization: Exploratory Desktop and the R package tidyquant. We’ll begin by showing how to get daily log returns, and we’ll end on a visual analysis of the log returns to characterize the stocks in terms of return quality, a combination of growth and volatility assessments.

Key Concepts

Before we dive into an analysis, there’s a few concepts we need to understand. Normally when downloading stock data, we get historical stock prices. The stock prices are usually obtained at a daily periodicity (aka daily time series frequency). The prices are not typically what we care about. Rather we want to understand the returns, which have statistical properties that can be analyzed. We will be using the log returns in this article. Log returns are commonly used for a number of reasons beyond the scope of this article. The interested reader can learn more in Why Log Returns?

Once returns are generated, we can understand the growth and volatility of the stock. The mean or average of the daily log returns is a measure of growth. The standard deviation of the daily log returns is a measure of volatility. These are important because we can use past return performance to screen and prospect stocks using a risk-reward profile consistent with our investment philosophy (e.g. high risk, low risk).

Getting Daily Log Returns from Stock Prices

We need to obtain the returns from the stocks. To do so, we’ll go through the following process:

  1. Getting historical prices for multiple stocks
  2. Grouping by stock symbol
  3. Creating a column of lagged prices — Used to calculate the difference between current and past prices.
  4. Calculating the daily log returns (DLR)

Step 1: Getting historical prices for multiple stocks

First, download stock prices. The following code snippet loads the tidyquant package and gets the stock prices for the FANG stocks, Facebook (FB), Amazon (AMZN), Netflix (NFLX), and Google (GOOG), by passing a list of stock symbols to tq_get() with get = “stock.prices". The time span is also passed using the from and to arguments. We’ll choose to analyze the last four full years of data.

library(tidyquant)
c("FB", "AMZN", "NFLX", "GOOG") %>%
tq_get(get = "stock.prices",
from = "2013-01-01",
to = "2016-01-01")

We are going to work with the “adjusted” stock prices because these are adjusted for stock splits. Use “Select” to select the columns “symbol”, “date”, and “adjusted”.

Use dplyr Select via “+Add” Button
Output following Select

Step 2: Grouping by stock symbol

We want to calculate the “daily log return”, which can be calculated as log(adjusted)-log(adj.lag). “adj.lag” is an adjusted price of the previous day and can be calculated with one of the Window function called lag. Note that the Window calculation should be done for each stock, so you want to group the data by the “symbol” column first. Once you implement the grouping by symbol, the workflow up to this point should look like this.

Step 3: Creating a column of lagged prices

Next, we need to add a column for the lag. dplyr has a nice function aptly named lag to help with this task. Just select “Mutate” and add a lag using lag(adjusted, n = 1).

Mutate to add a column using lag() function

We have some NA values to deal with. Otherwise our returns calculation will be incorrect. We can remove using filter. Just select “Filter”, the column “adj.lag” and the operator “is not NA”.

Remove NA values with Filter → is not NA

Here’s what the first few rows should look like up to this point.

Step 4: Calculating the daily log returns (DLR)

Get the logarithmic daily returns with Mutate and passing the calculation log(adjusted) — log(adj.lag).

Calculate Log Daily Returns

And finally, we have the returns for the FANG stocks. Here’s the workflow and data. Note that you should have four groups each with its own set of daily returns.

Reviewing the Workflow for Getting Log Returns

A Faster Way to Get Stock Returns

We went through quite a few steps to get daily log returns, which is great for learning but inefficient in practice. Exploratory includes a nice feature enabling R command line input. Let’s get the stock returns in one quick and easy tidyquant command.

First, you’ll need to load tidyquant from the R package dialog.

Click Dropdown For Project, Select R Package / Scripts
Load tidyquant

The tidyquant package has a few functions that allow us to use the infrastructure of various financial packages. We’ll use the tq_mutate() function to apply the periodReturn function from quantmod to get the daily log returns for each group of the daily stock prices. The following code allows us to send the adjusted column, Ad, to the period returns function, periodReturn. We can get the daily frequency using the additional argument period = "daily". We can add type = "log" to get log returns. And, to get the output column to match, we can set col_rename = "daily.returns.log".

tq_transmute(ohlc_fun   = Ad,
mutate_fun = periodReturn,
period = "daily",
type = "log",
col_rename = "daily.returns.log")

Select the “Command” button and enter the code snippet into one line as shown below. Click the green “Run” button, and watch as the daily log returns are added to the data frame as a new column. As an added benefit, the NA values are replaced with zero. No additional work is needed to visualize!

Custom Command Line Function

Visually Analyzing Stock Returns

Histogram: Visualizing Volatility

Now that we have returns, let’s visually explore them. A histogram is a nice way to view the distribution of the daily log returns. On the “Viz” tab select “Histogram” from the chart type, “daily.returns” from the x-axis, 50 buckets, and repeat by “symbol”.

Visualizing Daily Stock Returns

We can see that GOOG has a more narrow distribution than AMZN, FB and NFLX. This is good because it means lower volatility. NFLX has a wider distribution indicating more volatility.

Line Charts: Visualizing Return Quality Over Time

We can take this analysis a step further by investigating the mean and standard deviation of the daily log returns (referred to as MDLR and SDDLR going forward). The MDLR is the measure of central tendency: a positive value indicates the stock is growing on average. A negative value indicates the stock is declining. The SDDLR is the measure of the volatility: if the magnitude is low the stock is less risky and if the magnitude is high the stock is more risky. When combined, the MDLR and SDDLR give us an indication of return quality: how much risk are we willing to take for high returns?

First, let’s review the MDLR by year to get an idea of how consistently the stock is growing. To create the next visualization, we simply need to create a line chart. Go to the “Viz” tab and select a line chart. Select “date” for the x-axis and “daily.returns.log” for the y-axis. To the right of the x- and y-axis, select “YEAR” and “AVE”, respectively, so the mean of the daily log returns are charted by year.

Visualizing Growth (MDLR) Over Time

Reviewing the MDLR by year shows yet another picture. We can see that FB was the only stock that was consistently above zero. We can also see that NFLX had really good years in 2013 and 2015, but the MDLR by year has drastic swings.

We’ll do the same thing with SDDLR on a separate chart. Select “Copy” from the dropdown for the MDLR by year chart.

And, just change “MEAN” to “SD” next to the y-axis measure. The final chart should look something like this.

Visualizing Volatility (SDDLR) Over Time

We need to inspect the magnitude and trend of SDDLR. We can see that NFLX is consistently higher than the other stocks. This is the risk an investor is taking for growth.

Last, it’s nice to include all information on a single plot when reporting. Let’s create a fourth chart, this time adding a y-axis. We can arrange the repeated charts side-by-side by selecting the settings next to the “Repeat By” symbol and setting “Number of Columns” to 4. I also changed the MDLR to a bar chart to differentiate from the SDDLR. With a little customization you can achieve the final chart.

Ready to Present to Clients!

Conclusions

Putting all this information together, we can begin to make more informed investment decisions. By visually reviewing stocks, we can see how the investments tend to perform. NFLX had the best returns from 2013 through 2016, but it also had the greatest volatility. This stock can be characterized as “best return potential but not for the faint of heart”. If you are risk-averse, you should probably shy away. FB had the next best returns and was the only stock with above-zero MDLR in each year. This stock exhibits a good combination of moderate risk but consistently high returns. AMZN had the third best returns and volatility similar to FB. GOOG had the lowest returns but also had the lowest volatility, which might be desirable for analysts seeking consistent returns with minimum risk.

Recap

We introduced the basics of quantitative stock analysis using two tools: Exploratory Desktop and the tidyquant package. We started by collecting stock data from the web using tq_get and then converted to daily log returns. We then explored the stock returns by visually analyzing performance. We charted daily log returns using histograms and line charts to understand the stock return quality. From the visualizations and data analysis we drew several conclusions to characterize the stock performance.

Packages Used

--

--