Resources for Learning Data Science Visualization

A large component of a data science education is learning how to effectively visualize data. This could be part of exploratory data analysis, producing presentations for co-workers or bosses, or just because you want to show off something neat you found.

One of the content sections for Data 151: Introduction to Data Science tries to give students a real brief introduction to more complicated visualizations and functions, especially those available in R and Python. The books I’ve been using for the course (Doing Data Science by Cathy O’Neil and Rachel Schutt & Data Science from Scratch by Joel Grus) include sections on visualization. In-fact, I started initially collecting these resources from the “extra resources” section of Doing Data Science’s Chapter 9, specifically, the reference to FlowingData‘s tutorials and Michael Dubakov’s Visual Encoding Article.

Not all of FlowingData’s tutorials are free though, so this page details and links the FREE tutorials available (see end of post). Additionally, I struggled to find anything for Python of equal quality/type to FlowingData’s tutorials in R. So, for now I’ve resorted to sending the students to DataCamp.com‘s courses on Python visualization (which seem great so far, but are a bit different).

More generally, for a summary of some common/useful Python visualization libraries you can read either Mode Analytic’s article 10 Useful Python Data Visualization Libraries or KD-Nugget’s article Overview of Python Visualization Tools. There’s significant overlap of information between the two, ironically, I think Mode Analytic’s article does a better “Overview” while KDNugget’s article actually shows some code.

There’s also (perhaps obviously) entire books on this subject, but that seems like over-kill for an introduction class. (I haven’t really researched books yet, though when I get to planning our Scientific Visualization course there’ll be more posts) Similarly, even though D3 does amazing things, introducing another programming language seemed excessive. If you are interested in D3, Doing Data Science points to the D3 tutorials by Scott Murray at AlignedLeft.com. There’s also a few (free) tutorials on using D3 (and python) on FlowingData (mentioned below).


Free Tutorials from FlowingData in R:

Good, basic, initial tutorial:

Several Basics of Plotting:

Some more intermediate chart-types:

A few really nifty advanced charts:

Other: How to Download and Use Online Data with Arduino (Uses R)



Free Tutorials from FlowingData (not in R):

Python/D3
Getting Data (and some visualizations):

Visualization Focused:

JavaScript:

Other: How to Make an Interactive Area Graph with Flare (Uses Actionscript/Flash)


DataCamp.com allows academics to make free classes with full access to their premium content, but even if you can’t get free access, the first chapters are free for all and include several good visualization intros. I’ve only included the Python content here but they also have material on visualization in R.

As a quick primer, matplotlib is the ‘grandfather’ of plotting in python, so is the most technical/intensive but also the most powerful. ggplot2 and Bokeh are far easier to use and are probably better places to start learning.