This week for #MakeoverMonday, Eva Murray hooked us up with a live connection to an EXASOL playground. This would be my first experience with EXASOL and lets just say…I could not wait to get my hands on the data. The source was open data from the City of Chicago looking at taxi trip, tips, etc.
Usually for #MakeoverMonday we get just a few records, on occasion we get a few thousand, but this week we get 105 MILLION ROWS! And, it is on a live connection. What?!?!
I decided this week to document my exploration of the data through story points. I don’t use these a lot, but since I was designing without interactivity in mind, I thought it would be a good opportunity.
I registered for the EXASOL access, but I didn’t wait for the tds. I just jumped right into joining the tables.
Sure enough, 105M rows.
The first thing I did was create a line chart of trips by day.
I immediately noticed outliers on both ends of the spectrum. The lower ones were no brainers as they were all December 25th. The ones on the top were different days in March. I dropped weekday on the chart and sure enough they were all Saturdays. I was pretty sure this was a parade or something, but my google search skills uncovered these were the dates of the Chicago Shamrock Pub Crawl. I don’t know about you, but I have added this to my bucket list!
A note on the EXASOL connection. I did not even notice a blip while building this view.
— Adam Crahen (@acrahen) February 5, 2017
Next, I built a quick heatmap.
Here, I was looking for any patterns I didn’t spot in the line chart. I noticed there was one day in 2017 so I filtered the data source to remove it. I saw a leap year, St. Patty’s Day and Xmas again jumped out.
EXASOL had no problem at all with this view. Everything was just as fast as a small dataset.
I duplicated the map, zoomed into downtown and floated it in Lake Michigan on top of the original map. On the original map, I created a point annotation so I could put a rounded outline behind the zoomed map and draw a line with an arrow to downtown.
EXASOL had no problems, do you notice a pattern? When you have nothing exceptional to say, it says something.
Ok, there is no way that I am done with maps.
I wanted to try and map polygons for the census tracts, but they are not a default geo field in Tableau. I found a web data connector on this blog. This was awesome, it allowed me to just pull in data for a specific County or State. I think I grabbed all of Illinois and blended it with the EXASOL data.
This didn’t produce any insight really. Chicago taxis pickup a lot of people downtown and at the airport. Whoo! But, I did get to test blending web data with EXASOL and play with maps. Surprise, surprise, there was no noticeable delay or performance hit.
Next, I created all kinds of bins on tips, trips, etc. I looked at them across census tracts and various date levels, but nothing excited me. I decided to focus on tips. I ended up with the following chart.
Adding the year dimension to this view clearly exposed the steady increase in tipping and clearly showed tipping patterns by weekday and time of day are consistent. I grouped hours into 4 buckets to simplify the view. There was only one thing left to do here, #ThanksObama.
You guessed it, EXASOL was perfect.
This ended my first submission to #MakeoverMonday, however, if you give a crazy person 105 Million records, what will they do with it? If you said Double it and make more maps, you win a cookie.
I wanted to see what a custom SQL connection to completely UNION the live data source to itself would do. Buckle up.
Ok, this isn’t as hard as it sounds. Here is a trick. Connect to your data source and join the tables as you normal would (shown below). Then click on Data -> Convert to custom SQL.
A window will appear will all the SQL you generated by dragging and dropping the tables into the view above and joining them. Add a comma and one row to create a new field for path order. We need a starting and endpoint for each path.
Now, copy all the SQL in the open window and close it. Remove the tables from the view and drag New Custom SQL into the canvas. Paste in the SQL you just copied. On the Last row, add UNION, then paste the SQL code again below UNION. Now just change the second path order from 1 to 2.
Congratulations, now you have 210 Million rows to play with via custom SQL and a live connection. That’s not crazy…
I wanted to create a pickup/dropoff map using Weekday and the same time of day groups I used in the #ThanksObama chart above. To do this, you need a single LAT/LONG field which we can determine based on the Path Order field we created. Here is the code for Latitude:
CASE [Path Order] WHEN 1 THEN [PICKUP_CENTROID_LATITUDE] ELSE [DROPOFF_CENTROID_LATITUDE] END
Logitude is the same thing, just replace the Latitude fields with the pickup/dropoff Longitude.
Here is what the worksheet looked like.
The rest is all map formatting (i.e., washout, layers, mark transparency, mark size, etc.). I put all this on it’s own dashboard to create a data poster of Chicago’s taxi patterns. You can filter this by year. But fair warning, it did take EXASOL 34 seconds to render the view. I think for what I asked it to do, draw over 1.1 Million marks on a worksheet via custom SQL on a live connection…..yeah, I’m not complaining. I’ve seen Tableau extracts with 6 Million records perform at about the same speed. EXASOL did it with 204 Million more records.
Here is the final Poster.
I am probably not done with test driving EXASOL, so if I do anything else cool, I will write it up. But, from what I have seen, it is amazing. Please take my money.
— Adam Crahen (@acrahen) February 7, 2017