Program Structure

The program can be completed in twelve months (September to August). Students may elect to complete the program over 16, 21, or 24 months, and most do so. In some cases, exceptionally well-prepared students might be able complete their work in 9 months. Fifth-year masters students must complete the program within a year (September to August). All students begin the program in September; there is no option for starting in the spring semester.

For students taking longer than 12 months, full-time status for visa purposes is two credits per semester (and only one credit in the final semester). 

Course requirements are as follows: 

  • DATA 1030. Hands-on Data Science. Develops all aspects of the machine learning pipeline: data acquisition and cleaning, handling missing data, exploratory data analysis, visualization, feature engineering, modeling, interpretation, presentation in the context of real-world datasets. Fundamental considerations for data analysis are emphasized (the bias-variance tradeoff, training, validation, testing). Classical models and techniques for classification and regression are included (linear and logistic regression with regularization, support vector machines, decision trees, random forests, XGBoost). Uses the Python data science ecosystem (e.g., sklearn, pandas, matplotlib).
  • DATA 1050. Data Engineering. This course covers the storage, retrieval, and management of various types of data and the computing infrastructure (such as various types of databases and data structures) and algorithmic techniques (such as searching and sorting algorithms) and query languages (such as SQL) for interacting with data, both in the context of transaction processing (OLTP) and analytical processing (OLAP). Students will be introduced to measures for evaluating the efficacy of different techniques for interacting with data (such as ‘Big-Oh’ measure of complexity and the number of I/O operations) and various types of indexes for the efficient retrieval of data. The course will also cover several components of the Hadoop ecosystem for the processing of "big data." Additional topics include cloud computing, NoSQL databases, and modern data architectures. Introduction to some of the concepts and techniques of computer science essential for data science will also be covered.
  • APMA 1690. Computational Probability and Statistics. Examination of probability theory and mathematical statistics from the perspective of computing. Topics selected from random number generation, Monte Carlo methods, limit theorems, stochastic dependence, Bayesian networks, dimensionality reduction. Prerequisites: APMA 1650 or equivalent; programming experience is recommended.
  • DATA 2020. Statistical Learning. A modern introduction to inferential methods for regression analysis and statistical learning, with an emphasis on application in practical settings in the context of learning relationships from observed data. Topics will include basics of linear regression, variable selection and dimension reduction, and approaches to nonlinear regression. Extensions to other data structures such as longitudinal data and the fundamentals of causal inference will also be introduced.
  • DATA 2080. Data and Society. A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century.
  • CSCI 2470, or equivalent. Deep Learning. Deep Learning belongs to a broader family of machine learning methods. It is a particular version of artificial neural networks that emphasizes learning representation with multiple layers of networks. Deep Learning, plus the specialized techniques that it has inspired (e.g. convolutional neural networks, recurrent neural networks, and transformers), have led to rapid improvements in many applications, such as computer vision, machine learning, sound understanding, and robotics. This course gives students an overview of the prominent techniques of Deep Learning and its applications in computer vision, language understanding, and other areas. It also provides hands-on practice of implementing deep learning algorithms in Python. A final project will implement an advanced piece of work in one of these areas.
  • Machine Learning Theory: New course coming spring 2023. We will introduce the mathematical methods of data science through a combination of theory, computational methods, and visualization. We formally define the statistical learning framework, common assumptions in the data generation process, and learning models. The mathematical models behind common supervised and unsupervised techniques are discussed. Students will implement some of the algorithms from scratch using standard python and numpy. The course includes a final project. Students will read a peer-reviewed publication on a machine learning topic of their choice and they will write a blog post/article and give a presentation explaining the methods and results of the publication to a non-expert audience.
  • DATA 2050. Data Practicum. The practicum experience is a hands-on thesis project that entails an in-depth study of a current problem in data science. Students will synthesize their knowledge of probability and statistics, machine learning, and data and computational science. Students may use an internship in industry for the practicum, or work with a faculty member at Brown or elsewhere. The project must be approved beforehand by the DATA 2050 instructor and students must provide regular interim reports and a final presentation.  See examples here
  • Elective. Domain knowledge relevant to individual interest, 1 credit, must be a graduate level course with 4-digit course number starting with a non-0 digit. Most graduate level CSCI and APMA courses qualify. Please contact the DGS if you plan to take a course from a different department.