DATA Courses

There are many courses at Brown that include data analysis and data science. Searching Courses @ Brown with the keyword "data" will bring up most of these. The courses offered by the DSI are open to all Brown students (some have prerequisites) and are listed here:

  • DATA 0080: Data, Ethics, and Society. Offered fall semester. A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century. This course is limited to undergraduates and fulfills a requirement for Certificate in Data Fluency. Instructor: Deborah Hurley
  • DATA 0200: Data Fluency. Offered spring semester. As data science becomes more visible, are you curious about its unique amalgamation of computer programming, statistics, and visualizing or storytelling? Are you wondering how these areas fit together and what a data scientist does? This course offers all students regardless of background the opportunity for hands-on data science experience, following a data science process from an initial research question, through data analysis, to the storytelling of the data. Along the way, you will learn about the ethical considerations of working with data, and become more aware of societal impacts of data science. Course does not count toward CS concentration requirements. Prerequisites: CSCI 0111015001700190, CLPS 0950 or 1292. Instructor: Linda Clark
  • DATA 1150: Data Science Fellows. Offered fall semester. Data science is growing fast, with tools, approaches, and results evolving rapidly. This course is for students with some familiarity with data science tools and skills, seeking to apply these skills and teach others how to implement and interpret data science. Working in conjunction with a faculty sponsor, this course teaches students communication skills, how to determine the needs (requirements) for a project, and how to teach data science to peers. These valuable agile skills will be an incredible advantage moving forward in your professional development.  Interested students must submit an application form to indicate interest. Override requests will be granted only to students by instructor approval. Instructor: Linda Clark
  • DATA 1030. Hands-on Data Science. Offered fall semester. Develops all aspects of the machine learning pipeline: data acquisition and cleaning, handling missing data, exploratory data analysis, visualization, feature engineering, modeling, interpretation, presentation in the context of real-world datasets. Fundamental considerations for data analysis are emphasized (the bias-variance tradeoff, training, validation, testing). Classical models and techniques for classification and regression are included (linear and logistic regression with regularization, support vector machines, decision trees, random forests, XGBoost). Uses the Python data science ecosystem (e.g., sklearn, pandas, matplotlib). Prerequisites: A course equivalent to CSCI 00500150 or 0170. Instructor: Andras Zsom
  • DATA 1050. Data Engineering. Offered fall semester. This course covers the storage, retrieval, and management of various types of data and the computing infrastructure (such as various types of databases and data structures) and algorithmic techniques (such as searching and sorting algorithms) and query languages (such as SQL) for interacting with data, both in the context of transaction processing (OLTP) and analytical processing (OLAP). Students will be introduced to measures for evaluating the efficacy of different techniques for interacting with data (such as ‘Big-Oh’ measure of complexity and the number of I/O operations) and various types of indexes for the efficient retrieval of data. The course will also cover several components of the Hadoop ecosystem for the processing of "big data." Additional topics include cloud computing, NoSQL databases, and modern data architectures. Introduction to some of the concepts and techniques of computer science essential for data science will also be covered. Prerequisites: CSCI 0150 or 0170 or equivalent programming experience. Instructor: Shekhar Pradhan
  • DATA 2020. Statistical Learning. Offered spring semester. A modern introduction to inferential methods for regression analysis and statistical learning, with an emphasis on application in practical settings in the context of learning relationships from observed data. Topics will include basics of linear regression, variable selection and dimension reduction, and approaches to nonlinear regression. Extensions to other data structures such as longitudinal data and the fundamentals of causal inference will also be introduced. Prerequisite: APMA 1690 or equivalent. Instructor: Roberta DeVito
  • DATA 2080. Data and Society. Offered spring semester. A course on the social, political, and philosophical issues raised by the theory and practice of data science. Explores how data science is transforming not only our sense of science and scientific knowledge, but our sense of ourselves and our communities and our commitments concerning human affairs and institutions generally. Students will examine the field of data science in light of perspectives provided by the philosophy of science and technology, the sociology of knowledge, and science studies, and explore the consequences of data science for life in the first half of the 21st century. Instructor: Deborah Hurley

New courses coming in spring 2023: 

  • Algorithmic Fairness (cross-listed with CS): We know we want to build more equitable technology, but how? In this course we’ll review the latest developments in how to build more equitable algorithms, including definitions of (un)fairness, the challenges of explaining how ML works, making sure we can get accountability, and much more. Pre-requisites: knowledge of ML. Instructor: Suresh Venkatasubramanian
  • Machine Learning Algorithms: We will introduce the mathematical methods of data science through a combination of theory, computational methods, and visualization. We formally define the statistical learning framework, common assumptions in the data generation process, and learning models. The mathematical models behind common supervised and unsupervised techniques are discussed. Students will implement some of the algorithms from scratch using standard python and numpy. The course includes a final project. Students will read a peer-reviewed publication on a machine learning topic of their choice and they will write a blog post/article and give a presentation explaining the methods and results of the publication to a non-expert audience. Instructor: Andras Zsom
  • Text Analytics: In this course we will learn to extract information from documents that can be fed to an analytics pipeline. For example, extracting from a corpus of police reports information about crime incidents: type of crime, location, and time. A significant portion of the course will be spent on challenges in creating document corpora from pdf files, scanned documents, images, web documents, XML files, word docs, power point slides, etc. and getting them in shape (cleaning, de-duping, etc.) for feeding them to information extraction pipelines. Instructor: Shekhar Pradhan