From Webinar Summaries

JMM Session: Technology and Resources for Teaching Statistics and Data Science

This blog post is a collection of the presentations from the session I chaired at the 2019 Joint Mathematics Meeting. The session was titled “Technology and Resources for Teaching Statistics and Data Science”. It was co-sponsored by the MAA Committee on Technology in Mathematics Education (CTiME) and the SIG-MAA: Statistics Education (Stat-Ed).

The abstract for the session was:

One of the five skill areas in the American Statistical Association’s curriculum guidelines is “Data Manipulation and Computation” (pg. 9), embracing the need for students to be competent with programming languages, simulation techniques, algorithmic thinking, data management and manipulation, as well as visualization techniques. Additionally, the emphasis on using real data and problems and their inherent complexity means that technology is often necessary outside of specifically prescribed computational courses. This session invites instructors to contribute talks exploring the use of any software or technology in statistics education. Talks may include effective instructional or pedagogical techniques for linking programming to statistics, interesting classroom problems and the use of technology to solve them, or more.

Abstracts for the talks can be found here.

Teaching a Technology-Rich Intro Stat Course in a Traditional Classroom, presented by Patti Frazer Lock, St. Lawrence University

Using the Islands in an Introductory Statistics Course. presented by Carl Clark, Indian River State College

StatPowers-A Simple Web-Based Statistics Suite for Introductory Statistics, presented by Brian R Powers, Arizona State University

Using R Programming to Enhance Mathematical and Statistical Learning, presented by Joseph McCollum, Siena College

Computational Experience for Linear Regression and Time Series using R, presented by Rasitha R. Jayasekare, Butler University

Statistics teaching and research with R, presented by Leon Kaganovskiy, Touro College

GAISEing into the Future with Fun, Flexible Mobile Data Collection and Analysis, presented by Adam F. Childers, Roanoke College

Written Vs. Digital Feedback; Which improves Student Learning?, presented by David R. Galbreath, United States Military Academy

Using Authentic Data in Spreadsheet Assignments and Quizzes to Improve Students’ Attitudes towards Elementary Statistics, presented by Daniel A. Showalter, Eastern Mennonite University

Democratizing Data: Expanding Opportunities for Students in Data Science, presented  by Robin L. Angotti, University of Washington Bothell

I hope to have in the future the recordings of the session posted. Stay tuned for an updated post (I’ll also send an announcement).

NASEM Webinar 1: Data Acumen

This webinar aimed to discuss how to build undergraduate’s “data acumen”. If acumen isn’t a word you use regularly (didn’t before last year), it means “the ability to make good judgments and quick decisions”. Data acumen therefore is the ability to make good judgments and quick decisions with data. Certainly a valuable and important skill for students to develop! The webinar’s presenters were Dr. Nicole Lazar, University of Georgia and Dr. Mladen Vouk, North Carolina State University. Dr. Lazar is a professor of statistics at University of Georgia. Dr. Vouk is a distinguished professor of computer science and the Associate Vice Chancellor for Research Development and Administration.

Overall, this webinar seemed to be largely a waste of time, if your goal was to understand what activities, curricular design and practices would help students develop data acumen. (See my last paragraph for a suggested alternative) On the other-hand, if you’d like a decent description of the design and implementation of a capstone course, and the process of scaling a capstone course, listen to Dr. Lazar’s portion. If you still need an overview of the state of data science then Dr. Vouk’s portion provided a reasonable context for data science. The most valuable thing in the entire webinar was slides 26 and 27 (about minute 48). Slide 26 shows an excellent diagram for an “End-to-End Data Science Curriculum” that reasonably well articulates how a student might mature (and thereby gain data acumen), see figure 1 below. Slide 27 provides well-articulated learning objectives for core, intermediate and advanced data science courses (see table below)

From NASEM Data Acument Webinar. North Carolina State University’s Curriculum Vision
  • Core
    • Able to master individual core concepts within the bloom’s taxonomy:
      Knowledge, Comprehension, Application, Analysis, Evaluation, and Synthesis
    • Able to adapt previously seen solutions to data science problems for target domain-focused applications utilizing these core concepts
  • Intermediate Electives
    • Able to synthesize multiple concepts to solve, evaluate and validate the proposed data science problem from the end-to-end perspective
    • Able to identify and properly apply the textbook-level techniques suitable for solving each part of the complex data science problem pipeline
  • Advanced Electives
    • Able to formulate new domain-targeted data science problems, justify their business value, and make data-guided actionable decisions
    • Able to research the cutting edge technologies, compare them and create the optimal ones for solving the DS problems at hand
    • Able to lead a small team working on the end-to-end execution of DS projects

 

An Alternative to the NASEM Webinar

While I found this particular webinar to largely be a waste of time, I also attended the NASEM Roundtable on “Alternative Educational Pathways for Data Science” . While certainly not focused on data acumen I found the first presentation given at that round-table described an excellent overall curriculum structure that did build student’s data acumen. Eric Kolaczyk from Boston University described their non-traditional master’s program in Statistical Practice. By integrating their course work, practicum experiences, and more, students are forced to exercise and build their ability to make good judgments about data investigations, methods, and results. The talk is well worth your time if you’d like some ideas for non-standard ways to build student skills and abilities.

Webinar Summary: Data Science Education in Traditional Contexts

Introduction

This post is a summary and reflection on the webinar “Data Science Education in Traditional Contexts”. The webinar was hosted on Aug 28th by the South Big Data Innovation Hub as part of their Keeping Data Science Broad: Bridging the Data Divide series. You can watch the entire webinar here. The webinar consisted of 5 speakers and a discussion section. I’ve provided a short summary of each panelist’s presentation and the questions discussed at the end. The speakers, in order were:

  • Paul Anderson, College of Charleston
  • Mary Rudis, Great Bay Community College
  • Karl Schmitt, Valparaiso University
  • Pei Xu, Auburn University
  • Herman “Gene” Ray, Kennesaw State University

Summary of Presentation by Paul Anderson, College of Charleston

The first speaker was Paul Anderson, Program Director for Data Science at the College of Charleston. His portion of the presentation runs from 0:01:50-0:13:45, and expands on three challenges he has experienced, (1) being an unknown entity, (2) recruiting, and (3) designing an effective capstone. His first point, being an unknown entity, impacts a broad range of activities related to implementing and running a data science program. It can cause a challenge when trying to convince administrators to support the program or new initiatives (such as external collaborations). It means that other disciplines may not be interested in developing joint course work (or approving your curricular changes). His second point discussed what he’s learned from several years of working on recruitment. His first observation here ties to his first overall point: If your colleagues don’t know what data science is, how are most high school students to know (or even your students)?. This has led him to have limited success with direct recruitment from high schools. Instead, he’s focused on retooling the program’s Introduction to Data Science Course to be a microcosm of his entire program, both in terms of process and rigor. He’s also worked to make his program friendly to students switching majors or double majoring by having limited prerequisites. His final portion discussed the various forms of capstone experiences Charleston has experimented with. Starting from an initially 1-to-1 student-faculty project pair, moving into more group-based with a general faculty mentorship model. If you are considering including a capstone experience (and you should!) it’s probably worth listening to this portion. However, not all colleges or universities will have sufficient students/faculty to move into their final model.

Summary of Presentation by Mary Rudis, Great Bay Community College

The second speaker was Mary Rudis, Associate Professor of Mathematics at Great Bay Community College. Her portion runs 0:14:25-0:19:19 and 0:20:46-0:29:08. A significant portion of her presentation outlines the large enrollment and performance gap of non-white and first generation college students. Dr. Rudis saw building both an Associate Degree in Analytics, and a Certificate in Data – Practical Data Science as the best way to combat these gaps. In researching the state of jobs/education she found that community college students were struggling to compete for the limited internships and entry-level job opportunities available in data science, compared to 4-yr college students (like local M.I.T. students). Most companies in terms of hires were looking for Master’s level education, or significant work experience in the field. To help her students succeed, she built an articulation program with UNH-Manchester so that upon final graduation, students originally enrolled at GBCC would be full-qualified for the current job market.

Summary of Presentation by Karl Schmitt, Valparaiso University

The third speaker was Karl Schmitt, Assistant Professor of Mathematics and Statistics, Affiliate Professor of Computing and Information Sciences, and the Director of Data Sciences at Valparaiso University. His presentation runs from 0:30:30 – 0:45:20. The core of the presentation expanded on Dr. Anderson’s first point about data science being an unknown entity. He sought to provide ideas about how to differentiate programs from other similar programs, both at the college/university level, but also make the programs different when looking outside his own institution. Valparaiso has 6 data-focused programs:

His talk described how the programs can be differentiated in terms of the data user/professional that the program trains, and also in terms of course content and focus. He also talked about how Valpo is differentiating its program from other schools with a focus on Data Science for Social Good. This has been achieved in part by seeking industry partners from the government and non-profit sectors, rather than traditional industrial partners.

Summary of Presentation by Pei Xu, Auburn University

The fourth speaker was Pei Xu, Assistant Professor of Business Analytics, Auburn University. Her portion of the presentation runs from 0:46:05 – 0:57:55 and describers Auburn’s undergraduate Business Analytics Degree. Auburn’s curriculum is designed around the data science process of Problem Formulation -> Data Prep -> Modeling -> Analysis -> Presentation. Each of the core classes covers 1-2 stages of this process, with the specialized degree courses typically beginning in a student’s sophomore year. Their program also actively engages many businesses to visit and provide information sessions. Dr. Xu detailed 4 challenges she’s faced related to their program. First, she has found it hard to recruit qualified faculty for teaching courses, which she’s overcome by progressively hiring over the last few years. She has also found many students to be turned away by the high quantitative and computational nature of the program. This has been addressed by building a stronger emphasis on project-based learning and more interpretation than innovative process development. Third, she discussed how many of the core courses in their program have significant overlap between courses. For example, many courses in different areas all need to discuss data cleaning/preparation. Auburn’s faculty has spent significant curriculum development time discussing and planning exactly what content is duplicated and where. Finally, deciding between the various analytics tools for both the general curriculum and specific classes has proved challenging (you can see an extended discussion by me of Python/R and others in here).

Summary of Presentation by Herman “Gene” Ray, Kennesaw State University

The fifth speaker was Herman “Gene” Ray, Associate Professor of Statistics and Director for the Center for Statistics and Analytics Research, Kennesaw State University. His presentation is from 0:58:36 – 1:07:35 and focuses on KSU’s Applied Statistics Minor.  KSU’s program strongly focuses on domain areas, with most courses having a high-level of applications included and types of experiential learning opportunities. Additionally, almost all their courses use SAS in addition to introducing their students to a full range of data science software/tools. The first experiential learning model KSU uses is an integration of corporate data-sets and guided tasks from business. The second model is a ‘sponsored research class’ with teams of undergraduates led by a graduate student on corporation provided problems or data. Gene provided extended examples about an epidemiology company and about Southron Power Company. The key benefits KSU has seen are that students receive real world exposure, practice interacting with companies, potentially even receiving awards, internships, and jobs. The largest challenge to this experiential learning model is that is requires a significant amount of time, first to develop the relationships with companies, managing corporate expectations, and finally in the actual execution of projects for both faculty and students.

Additional Webinar Discussion

The additional discussion begins at 1:08:32. Rather than summarize all the responses (which were fairly short), I’m simply going to list the questions, in-order as they were answered and encourage interested readers to listen to that portion of the webinar or stay tuned for follow-up posts here.

  1. What can High Schools do to prepare students for data science?
  2. What sort of mix do programs have between teaching analysis vs. presentation skills?
  3. Is it feasible for community colleges to only have an Introduction to Data Science course?
  4. How have prerequisites or program design affected diversity in data science?
  5. How is ethics being taught in each program? (and a side conversation about assessment)