A large component of a data science education is learning how to effectively visualize data. This could be part of exploratory data analysis, producing presentations for co-workers or bosses, or just because you want to show off something neat you found.
One of the content sections for Data 151: Introduction to Data Science tries to give students a real brief introduction to more complicated visualizations and functions, especially those available in R and Python. The books I’ve been using for the course (Doing Data Science by Cathy O’Neil and Rachel Schutt & Data Science from Scratch by Joel Grus) include sections on visualization. In-fact, I started initially collecting these resources from the “extra resources” section of Doing Data Science’s Chapter 9, specifically, the reference to FlowingData‘s tutorials and Michael Dubakov’s Visual Encoding Article.
Not all of FlowingData’s tutorials are free though, so this page details and links the FREE tutorials available (see end of post). Additionally, I struggled to find anything for Python of equal quality/type to FlowingData’s tutorials in R. So, for now I’ve resorted to sending the students to DataCamp.com‘s courses on Python visualization (which seem great so far, but are a bit different).
More generally, for a summary of some common/useful Python visualization libraries you can read either Mode Analytic’s article 10 Useful Python Data Visualization Libraries or KD-Nugget’s article Overview of Python Visualization Tools. There’s significant overlap of information between the two, ironically, I think Mode Analytic’s article does a better “Overview” while KDNugget’s article actually shows some code.
There’s also (perhaps obviously) entire books on this subject, but that seems like over-kill for an introduction class. (I haven’t really researched books yet, though when I get to planning our Scientific Visualization course there’ll be more posts) Similarly, even though D3 does amazing things, introducing another programming language seemed excessive. If you are interested in D3, Doing Data Science points to the D3 tutorials by Scott Murray at AlignedLeft.com. There’s also a few (free) tutorials on using D3 (and python) on FlowingData (mentioned below).
Free Tutorials from FlowingData in R:
Good, basic, initial tutorial:
Several Basics of Plotting:
- Getting Started with Charts in R
- How to Visualize and Compare Distributions
- How to Read and Use Histograms in R
- The Baseline and Working with Time Series in R
- How to make a scatterplot with a smooth fitted line
Some more intermediate chart-types:
- Moving Past Default R Charts
- How to make a Heatmap – A quick and easy solution
- How to make bubble charts
- Beeswarm Plot in R to show Distributions
- An Easy Way to Make a Treemap (for Hierarchical data)
A few really nifty advanced charts:
- Voronoi Diagram and Delaunay Triangulation in R
- How to map connections with great circles
- How to visualize data with cartoonish faces ala Chernoff
Other: How to Download and Use Online Data with Arduino (Uses R)
Free Tutorials from FlowingData (not in R):
Getting Data (and some visualizations):
- Downloading your Email Metadata
- How to make your own Twitter Bot – Python Implementation
- Grabbing Weather Underground Data with Beautiful Soup
- How to Make Interactive Linked Small Multiples (i.e. Multiple Plots)
- How to Make a US County Thematic Map using Free Tools (or a choropleth map)
Other: How to Make an Interactive Area Graph with Flare (Uses Actionscript/Flash)
DataCamp.com allows academics to make free classes with full access to their premium content, but even if you can’t get free access, the first chapters are free for all and include several good visualization intros. I’ve only included the Python content here but they also have material on visualization in R.
As a quick primer, matplotlib is the ‘grandfather’ of plotting in python, so is the most technical/intensive but also the most powerful. ggplot2 and Bokeh are far easier to use and are probably better places to start learning.
- Chapter 1 of Data Visualization with ggplot2 (Part 1) (uses ggplot2)
- Chapter 1, Part 2 seems largely to be statistics and less new plots…
- Chapter 1 of Data Visualization with ggplot2 (Part 3) (uses ggplot2)
- Chapter 1 of Introduction to Data Visualization with Python (uses matplotlib)
- Chapter 1 of Interactive Data Visualization with Bokeh (uses Bokeh)