Thinking Outside the Extract

Thinking Outside the Extract

Refreshable Google Sheets are a GAME CHANGER!

If you’ve read the recent blogs on scraping data from the web using Google Sheets AND refreshing this data on Tableau Public, I am just going to just assume you are thinking the same thing I was…

If you haven’t read them yet, check out Jeffrey Shaffer’sFlorian Ramseger’s, and Andy Kriebel’s posts.  Ok, now we are all on the same page and your inner data geek reading is off the chart!

I immediately set off to add this new tool to my bag of tricks.  After all, you can never have too many ways to get to data.

I started with a data source I have considered multiple times in the past: The 27 Club.  This is basically a list of musicians and other famous people who died at age 27.  Think Jim Morrison, Jimi Hendrix, Janis Joplin, Kurt Cobain, Brian Jones, and Amy Winehouse.  This is just a simple table on Wikipedia.  In the past, I’ve simply copied and pasted data into Tableau or used ParseHub to scrape the page, but thought I’d try this ImportHTML formula for Google Sheets.

=IMPORTHTML(“https://en.wikipedia.org/wiki/27_Club”,“table”,1)
The 27 Club in Google Sheets
The 27 Club in Google Sheets
Throwaway Data Exploration
Throwaway Data Exploration

Ok, that was way too easy!

Unfortunately, there really aren’t any interesting patterns in this data.  Also, it needs quite a bit of cleanup and this visualization stalled.  However, while it appears 27 year-old musicians are relatively safe on Friday nights, I’m assuming the spike on Saturday is at least related to the Friday night/early Saturday Shenanigans!

I decided to search for another data source to test.

I went to Google Finance and searched for DATA.  This is Tableau’s Stock Symbol on the New York Stock Exchange.  The summary didn’t have any tables so I went to the Historical Prices.

Tableau Historical Stock Prices - Google Finance
Tableau Historical Stock Prices – Google Finance
=IMPORTHTML(“https://www.google.com/finance/historical?q=NYSE%3ADATA&ei=-MO_V9i-J8rvmAGumLngBw”,“table”,4)

After a little guessing, I was back in business with the above formula.  However, I saw the download to spreadsheet link.  We don’t always have a perfect HTML table to read.  I remembered Florian’s blog had referenced an ImportData formula in Google Sheets.  I right clicked on the link and copied the URL.  Back to Google Sheets….

=IMPORTDATA(“http://www.google.com/finance/historical?q=NYSE%3ADATA&ei=10u-V4DEE8KQmgGLprPoBg&output=csv”)

I just pasted the link in there and a minute or two later it populated the table!

ImportData Formula
ImportData Formula

From here, it was simple to connect using the new Google Sheet connector in Tableau 10.

Ok, I have some data and it is going to refresh daily!

I am one of those set it and forget it guys.  This means my Tableau Public work has a short life span for me.  I am always thinking about what I learned and then move on to the next viz.  I don’t want to go back and maintain a bunch of live visualizations.  I do that at work.

What do I need to do differently so this viz looks good all the time on Tableau Public?

It may seem like the viz above is pretty simple, but there are a ton of tricks built in to ensure it stays looking nice and pretty.

The spark line 

1. Why does Google Finance exclude zero?  My first change was to include zero on the axis.

2. I added a parameter to allow the user to select which measure to plot.  This used a simple case statement so I’ll skip the formula.  I dragged the parameter to the row shelf and rotated the label to add a dynamic axis label.

3. I wanted to show labels on the min/max values and the line ends.  Simple enough.  I’ll create a dual axis chart and set the label on one to min/max and the other to line ends.

2016-08-26_00-58-54

Ok, this looks pretty good, but what about volume….

2016-08-26_01-02-57

Nope, this looks like crap!

3A. The first step to fix this was adding dynamic units.  I wish Tableau did this automagically, but the following formula will round millions and thousands to one decimal place and anything less than 1,000 to two decimal places.  This field can be used for labels while using the original measure on the axis.

Measure Units Amount

IF LEN(str(ROUND(SUM([Measure]),0)))<4
THEN STR(Round(SUM([Measure]),2))
ELSEIF LEN(str(Round(SUM([Measure]),0)))<7
THEN STR(Round(SUM([Measure])/1000,1)) + ‘K’
ELSEIF LEN(str(ROUND(SUM([Measure]),0)))<10
THEN STR(ROUND(SUM([Measure])/1000000,1)) + ‘M’
END

The best part is these are strings, so if you want to write out the units, just change ‘M’ to ‘Million’.  If you want to add a dollar sign, add ‘$’ in front of the string amount.  Need Billions? Add another condition.  Want to strip the .0 off?  That’s possible too, but requires another calc.  You get the idea, you can format these almost anyway you want.  So how does Volume look now?

Dynamic Unit Labels
Dynamic Unit Labels

Ok this is better, but still not great.

3B. We need to add some space on all four sides so the labels have some room.  In the past we could just fix the axis to give the labels some room, but now the data is going to refresh.  So let’s make these pads dynamic.

Space on the top: {MAX([Measure])}*1.07
Space on the bottom: {MAX([Measure])}*-.03
Space on the left: {MIN(DATEADD(‘day’,-14,[Date]))}
Space on the right: {MAX(DATEADD(‘day’,14,[Date]))}

Add these four reference lines and format using no label and no line.  They will create some blank white space all around the chart and the labels will have some breathing room.  How do we look now?

Chart with Ref Lines
Chart with Ref Lines

I am hovering over one of the reference lines, but don’t worry we’ll fix that.  The labels look a lot better.

4. Float text boxes over the reference line “zones” so the user doesn’t have the horrible experience when they accidentally hover.  I wish we could disable these.

Float Text Boxes
Float Text Boxes

Lastly, Images

I originally published this viz with several images because I wanted to link to the Source, Twitter, my blog, etc.  It appears the Google Sheet Refresh breaks embedded images.  I hosted the image of the Tableau logo and Twitter Icon and floated webpages in those spots.  Unfortunately, I can’t link to URLS, but now I don’t have blank spaces anymore.

I hope you found this useful and are now thinking outside the extract.

Note:  While I was writing this blog, Jeffrey Shaffer published almost the exact same viz here.  He gathered the data in  different manner, but great minds think alike I guess!

Leave a Reply