Sep 1, 2018

Undergraduate Data Science: Opportunities and Options

Nicholas Horton (Amherst College) was a co-author of the recent National Academies consensus study report that is available for free download from https://nas.edu/envisioningds. He writes:

Recent years have seen the dramatic rise of data science, revolutionizing industry and science. The NSF-funded National Academies consensus report entitled Undergraduate Data Science: Opportunities and Options (NASEM, 2018) noted that “as more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.”

Much has been written on the growth of data science and the role of statistics plays within it (see for example Donoho, 2017). Historically, working in data science has required a graduate degree. However, many reports indicate a shortage of well-trained data scientists to fill new positions, with many opportunities now available to those with appropriate undergraduate training. Given the demands of the workforce, the committee, chaired by Laura Haas (University of Massachusetts/Amherst) and Al Hero (University of Michigan) was charged with setting forth a vision for undergraduate data science with a focus on applications of and careers in data science.

The second chapter of the report laid out key concepts that data science professionals need to know. Building on the work of De Veaux et al. (2017), the report proposes “data acumen” as a framework for the education of future data scientists. This requires “exposure to key concepts in data science, real-world data and programs that can reinforce the limitations of tools, and ethical considerations that permeate many applications”. The committee outlined ten (overlapping) areas fundamental to developing data acumen: Mathematical foundations; Computational foundations; Statistical foundations; Data management and curation; Data description and visualization; Data modeling and assessment; Workflow and reproducibility; Communication and teamwork; Domain-specific consideration; and Ethical problem solving.

Mathematics is essential to data science, but questions remain about what type and how much mathematics is needed for bachelors’ graduates. The committee identified key concepts that would be important for all students, including set theory and basic logic; multivariate thinking (via functions and graphical displays); basic probability theory and randomness; matrices and basic linear algebra; networks and graph theory; and optimization.

Statistics was also seen as foundational to data science. Key concepts identified by the committee include variability, uncertainty, sampling error, and inference; multivariate thinking; non-sampling error, design, experiments, biases, confounding, and causal inference; exploratory data analysis; statistical modeling and model assessment; and simulations and experiments.

The third chapter of the report focused on how to develop courses (e.g., data science for all, introduction to data science) and programs (e.g., certificates, minors, and majors) that would provide flexible pathways to students. The fourth chapter reviewed challenges and barriers that need to be addressed in developing data science programs. The fifth chapter reiterated the key role that formative and summative assessment and faculty development plays in advancing data science.

What are the implications of the report and the growth of undergraduate data science for statisticians and the IMS? De Veaux et al (2017) noted that: “Students should understand the basic statistical concepts of data collection, data wrangling, data analysis, modeling, and inference. … Successful graduates should be able to apply statistical knowledge and computational skills to formulate problems, plan data collection campaigns or identify and gather relevant existing data, and then analyze the data to provide insights.”

More work is needed to create courses and flexible pathways that can provide sufficient mathematical and statistical background without a long succession of prerequisite courses, while also ensuring that students have strength in algorithmic thinking, data technologies, and domain knowledge.

The report notes that data science is in a formative development stage with robust growth likely. Academic institutions are recommended to “embrace data science as a vital new field” and “provide and evolve a range of educational pathways to prepare students for an array of data science roles in the workplace” (NASEM, 2018).

More discussion is also needed about future preparation at the graduate level, to ensure that interested data science graduates at the bachelors’ level are able to matriculate and successfully complete doctoral programs in statistics.

At a time when many (most?) institutions are pioneering data science programs, it is important for mathematical statisticians to ensure that they are part of the process of attracting students with varied backgrounds and degrees of preparation and preparing them for success in a variety of careers.

References:

De Veaux, R., et al. (2017). Curriculum guidelines for undergraduate programs in data science, Annual Review of Statistics and its Applications, 4:15-30. https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-060116-053930.

Donoho, D. (2017). 50 Years of Data Science, Journal of Computational and Graphical Statistics, 26:4, 745–766. doi:10.1080/10618600.2017.1384734.

National Academies of Sciences, Engineering, and Medicine (2018). Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. doi:10.17226/25104.

Share

Leave a comment

*

Share

Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org
Latest Issue