Tagged Inspirations

Course/Curriculum Resource Sites

Last week I posted about specific websites you might use to host or pull assignments from. This week I want to take a broader look at overall curriculum design. This is by no means a comprehensive posting of sites that have curriculum available, instead it’s intended to help reduce your search time for this kind of material.

If you are looking to find wholesale curriculums, including course materials, there are a few options available to start the creative juices flowing. The first, and probably most academic, is the European Data Science Academy (EDSA). The EDSA is grant funded with a large number of academic (university) and research institute partners from across Europe. The thing I like best about this work is that they started with a demand analysis study of the skills needed and current jobs in data science across the EU. Furthermore, from the start the project built in a feedback and revision cycle to improve and enhance the topics, delivery, etc. To understand their vision, see the image below.

This idea of continual improvement was more than just a grant seeking ploy as shown by their list of releases, revisions, and project deliverables. While the current site still lists four learning modules as unreleased, they are expected July 2017.

Overall, their curriculum structure (I haven’t evaluated their deeper content) has a fairly high emphasis on computational topics, with less statistics/mathematical underpinnings. You can experience their curriculum directly (it’s free/open access) through their online course portal. What might be far more valuable though is their actual grant’s deliverables. These deliverables include details on the overall design principles in their structure with learning objectives, individual courses with their own learning objectives, descriptions of lesson topics/content and more. Using their outlines and ideas to guide your own construction of a curriculum is both reasonable and a great way to make sure you aren’t missing any major, important topic, however, this should be done with proper attribution and license checking (of course).

The other two places to look for curricular inspiration are also in the ‘open source’ category, but not funded by grants or (traditional) academic institutions. The Open Source Data Science Masters was constructed by Clare Corthell, who has gone on to found his own data science consulting firm and other initiatives. While not every link on the site is actually to a free resource (there’s several books to buy etc), it does a pretty nice job of highlighting the topics that will need to be covered (if possible), and provides lots of places to start pulling course materials from (or getting inspiration/ideas for content). The primary curriculum is python focused, however he also has a collection of R resources.

Corthell isn’t the only one though with an “open source” or “free” data science (masters) degree. Another collection of relatively similar material was collected by David Venturi, who’s now a content developer at Udacity (writing data science curriculum of course). For those designing curriculums, both Corthell and Venturi provide excellent resources and places to frame your learning. However if you hit this page trying to get into data science, read this Quora post that I think accurately highlights the challenges of learning from/with these open source programs.

Another similar alternative, that I’d peg closer to an undergraduate degree, is the Open Source Society University‘s data science curriculum. Their curriculum assumes a lot less pre-knowledge in mathematics and statistics, providing links for Calculus, Intro Statistics, etc. This content is probably more in-line with the recommendations for curriculum from the Park’s paper (see my Curriculum Resources page). What I particularly like about this (from a learning perspective) is that it actually details the amount of work per week required to learn from each course. You’ll see a large repetition of topics, but the OSS-Univ’s curriculum has a lot less advanced material, with only a few courses in big data, wrangling, etc.

At the end of the day, if you are looking to implement an undergraduate or graduate degree in data science, your university is going to have to offer duplicates of a significant subset of classes from these curriculums. While emulation might be the highest form of praise, we’ll each need our own, unique take on these courses while striving for sufficient similarity to have a semi-standardized knowledge base for practitioners. Good luck!


Why an Undergraduate Data Science Degree?

The job ‘Data Scientist’ was heralded as “The Sexiest Job of the 21st Century” by Harvard Business Review in 2012[1] at a crest of the ongoing publicity in the career fields associated with ‘big data.’ Articles on both the discipline and reality regularly appear in a variety of popular press outlets, including The Economist[2] and The New York Times[3], concurrently with growing discussion in more scholarly venues. The increased need for this specialty is driven by the fact that human activity is already generating petabytes of data each day and “data is projected by some experts to increase by 2,000 percent between now and 2020”[4]. Society will need more professionals and researchers capable of competently dealing with the huge influx of data that will be accumulated in the next decade and onward.

All this is great, and certainly helps motivate the creation of an undergraduate degree in data science (the language above came from our internal proposal), but it’s not what actually inspired me to start the process. That came from two sessions at SIGCSE 2014[5].The first was on a paper by Paul Anderson,  James Bowring, Renee McCauley, George Pothering and Christopher Starr titled: “An Undergraduate Degree in Data Science: Curriculum and a Decade of Implementation Experience” (DOI: http://dx.doi.org/10.1145/2538862.2538936 also linked on the resource pages). The other was a panel session, “Data Science as an Undergraduate Degree” with Paul Anderson, James McGuffee and David Uminsky (DOI: http://dx.doi.org/10.1145/2538862.2538868). At these sessions I got to hear what the College of Charleston (Paul Anderson) and the University of San Francisco (David Uminsky) were doing with undergraduate degrees. And it sounded like things that Valparaiso University was already offering, with the exception of perhaps an introductory course in data science. Moreover, it sounded like exactly the sort of degree I wish I’d been able to take as an undergraduate!

However, being able to actually follow-through with offering the program had more to do with several additional factors (besides the excitement). Before diving further into the process of actually creating the curriculum and elements, I want to discuss what made Valpo ready to start a Data Science degree so you can evaluate for yourself if it’s even feasible…

Valparaiso University already had…

  • A large Mathematics & Statistics Department (14 tenured/tenure-track faculty, two full-time lectures, and adjuncts).
  • Significant faculty experience in operations research, graph theory and scientific computing
  • A deep statistics curriculum including an actuarial science major
  • A complete computer science degree, covering all the basics
  • The Mathematics & Statistics Department and Computing & Information Sciences department had only recently split into two departments, so still had very strong communication and ties together.
  • A large, top-ranked college of engineering requiring more frequent offerings of mathematics and statistics electives, partially populated by engineers.
  • A master’s degree in Information Technology with 150-250 students, regularly offering courses in data mining, and information management systems (databases).
  • A master’s degree in Analytics and Modeling, where many of the courses were cross-listed with undergraduate courses

Together these factors combined to allow Valpo to start the new degree with very minimal curricular changes or additions which is not something feasible at most schools. Now, you certainly don’t need all of these factors to start your own program, but I you’ll probably at least need strong mathematics, statistics and computer science departments with good, clear communication between them. The rest just makes it easier.

[1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

[2] 5,300 search hits for “Big Data” in the print edition of the Economist. www.economist.com

[3] 304 search hits for “Big Data” articles in the last 12 months of NY Times. www.nytimes.com

[4] http://www.wired.com/2015/01/a-new-generation-of-data-requires-next-generation-systems/

[5] SIGCSE refer’s to the Association of Computing Machinery (ACM)’s Special Interest Group for Computer Science Education. Specifically, SIGCSE is usually used to refer to the group’s annual Technical Symposium, typically held in early March.