Health Data Science Track

Data related to health and life sciences has never been more challenging and complex. Biostatistical data scientists with rigorous methodologic training, experience in modern computing and collaboration, and involvement in domain research are in high demand in the health sector and beyond!

Students in the Biostatistics Master’s Health Data Science track will receive in-depth training in biostatistical theory and methods, augmented by specialized training in modern data science techniques.

Data science by its very nature is interdisciplinary. It cannot be accomplished without mathematics, computer science, statistics and knowledge of the application field. The Health Data Science track in the Biostatistics Master's Graduate Program will give students in-depth skills in the key areas of data science where statistics lie. In addition, students will acquire familiarity with all other aspects of data science. This Health Data Science track will provide training in:

  • Modeling, Statistical Learning, and High Dimensional Data
  • Fundamental Data Science Concepts and Approaches
  • Statistical Computing
  • Communication 
  • Collaboration
  • Familiarity with Full Process of Data Science

The Health Data Science track of the Biostatistics Master's of Science degree program requires 9 courses plus the Public Health online course (PHP101). The curriculum for this track follows the sequence listed below.  Interested candidates should apply for this track through the Biostatistics Master's application process described on our Master's Admissions page and make your interest known through your personal statement which is required as part of your application.

Required Data Science Track 

Semester 1
Fundamentals of Probability and Statistical Inference (PHP 2515)
Statistical Programming in R (PHP 2560)
Introduction to Methods in Epidemiologic Research (PHP 2120)   
    or Foundations in Epidemiologic Research Methods (PHP 2150)
Semester 2
Applied Generalized Linear Models (PHP 2514)
Programming or Health Data Science Elective (See below)
Semester 3
Practical Data Analysis (PHP 2550)
Programming or Health Data Science Elective (See below)
Semester 4
Statistical Learning / Big Data (PHP2650)
Programming or Health Data Science Elective (See below)


Programming or Health Data Science Electives

Methods in Informatics and Data Science for Health (PHP 2561 REQUIRED)
Machine Learning (CSCI 1420)
Deep Learning (CSCI 1470)
Design and Analysis of Algorithms (CSCI 1570)
Computational and Molecular Biology (CSCI 1810)
Introduction to Health Decision Analysis (PHP 2465A)

The Department of Biostatistics offers the following courses:

(See "Program Requirements" drop down for eligible courses for your degree/track)

Updated Fall 2018

Course # Course Name
PHP 0100 Statistics Everywhere (Undergraduate Course)
  Freshman Seminar: Statistics is the universal language behind data-enabled decision making. Examples include Google's page ranking, Amazon's customer recommendations, weather prediction, medical care and political campaign strategy. This seminar will expose students to a variety of problems encountered in the media, in science and in life for which solutions require analysis of and drawing inferences from data. We will introduce basic concepts such as randomness, probability, variation, statistical significance, accuracy, bias and precision. The course will discuss statistical problems from reading assignments and material identified by the students. We will use simulation to illustrate basic concepts, though previous programming experience is not required.
PHP 1501 Essentials of Data Analysis (Undergraduate Course)
  This course covers the basic concepts of statistics and the statistical methods commonly used in the social sciences and public health with an emphasis on applications to real data. The first half of the course introduces descriptive statistics and the inferential statistical methods of confidence intervals and significance tests. The second half introduces bivariate and multivariate methods, emphasizing contingency table analysis, regression, and analysis of variance. This is designed to be a first course in Statistics. The course is intended for Public Health or Statistics concentrators. Others can register with instructor's permission. There are no prerequisites.
PHP 2030 Clinical Trial Methodology
  We will examine the modern clinical trial as a methodology for evaluating interventions related to treatment, rehabilitation, prevention and diagnosis. Topics include the history and rationale for clinical trials, ethical issues, study design, protocol development, sample size considerations, quality assurance, statistical analysis, systematic reviews and meta-analysis, and reporting of results. Extensively illustrated with examples from various fields of health care research.
PHP 2507 Biostatistics and Applied Data Analysis II
  The objective of the year long, two-course sequence is for students to develop the knowledge, skills and perspectives necessary to analyze data in order to answer a public health questions. The year long sequence will focus on statistical principles as well as the applied skills necessary to answer public health questions using data, including: data acquisition, data analysis, data interpretation and the presentation of results. Through lectures, labs and small group discussions, this fall semester course will focus on identifying public health data sets, refining research questions, univariate and bivariate analyses and presentation of initial results. Prerequisite: understanding of basic math concepts and terms; basic functional knowledge of Stata. Enrollment limited to 50 MPH and CTR students. Instructor permission required.
PHP 2508 Biostatistics and Applied Data Analysis II
  Biostatistics and Applied Data Analysis II is the second course in a year-long, two-course sequence designed to develop the skills and knowledge to use data to address public health questions. The courses are specifically for students in the Brown MPH program, and the training programs in Clinical and Translational Research. The sequence is completed in one academic year, not split across two years. The courses focus on statistical principles as well as the applied skills necessary to answer public health questions using data, including: acquisition, analysis, interpretation and presentation of results. Prerequisite: PHP 2507. Enrollment limited to 48. Instructor permission required.
PHP 2510 Principles of Biostats & Data Analysis
  Intensive first course in biostatistical methodology, focusing on problems arising in public health, life sciences, and biomedical disciplines. Summarizing and representing data; basic probability; fundamentals of inference; hypothesis testing; likelihood methods. Inference for means and proportions; linear regression and analysis of variance; basics of experimental design; nonparametrics; logistic regression.
PHP 2511 Applied Regression Analysis
  Applied multivariate statistics, presenting a unified treatment of modern regression models for discrete and continuous data. Topics include multiple linear and nonlinear regression for continuous response data, analysis of variance and covariance, logistic regression, Poisson regression, and Cox regression.
PHP 2514 Applied Generalized Linear Models
  This course provides a survey of generalized linear models (GLMs) for outcomes including continuous, binary, count, survival and correlated data. This course will work through the basic theories of GLMs. Emphasis will be on understanding the implications of this theory and the applications to solving real data problems. Extensive use of computer programming will be required to analyze the data in this class. This course is designed for graduate and advanced undergraduate students who will be analyzing data and want to develop a practical hands on toolkit as well as understanding of the theoretical underpinnings of regression.
PHP 2515 Fundamentals of Probability & Statistical Inference
  This course will provide an introduction to probability theory, mathematical statistics and their application to biostatistics. The emphasis of the course will be on basic mathematical and probabilistic concepts that form the basis for statistical inference. The course will cover fundamental ideas of probability, some simple statistical models (normal, binomial, exponential and Poisson), sample and population moments, nite and approximate sampling distributions, point and interval estimation, and hypothesis testing. Examples of their use in modeling will also be discussed.
PHP 2516 Applied Longitudinal Data Analysis
  This course provides a survey of longitudinal data analysis. Topics will range from exploratory analysis, study design considerations, GLM for longitudinal data, covariance structures, generalized linear models for longitudinal data, marginal models and mixed effects. Data and examples will come from medical/pharmaceutical applications, public health and social sciences.
PHP 2517 Applied Multilevel Data Analysis
  This course provides a survey of multilevel data analysis. Topics will range from structure of multilevel data, basic multilevel linear models, multilevel GLM, Model testing and evalatuation and missing data imputation. Data and examples will be drawn from medical, public health and social sciences. Students will be using real data throughout this course.
PHP 2520 Statistical Inference I
  First of two courses that provide a comprehensive introduction to the theory of modern statistical inference. PHP 2520 presents a survey of fundamental ideas and methods, including sufficiency, likelihood based inference, hypothesis testing, asymptotic theory, and Bayesian inference. Measure theory not required.
PHP 2530 Bayesian Inference
  Surveys the state of the art in Bayesian methods and their applications. Discussion of the fundamentals followed by more advanced topics including hierarchical models, Markov Chain Monte Carlo, and other methods for sampling from the posterior distribution, robustness, and sensitivity analysis, and approaches to model selection and diagnostics. Features nontrivial applications of Bayesian methods from diverse scientific fields, with emphasis on biomedical research.
PHP 2550 Practical Data Analysis
  Covers practical skills required for successful analysis of scientific data including statistical programming, data management, exploratory data analysis, simulation and model building and checking. Tools will be developed through a series of case studies based on different types of data requiring a variety of statistical methods. Modern regression techniques such as cross-validation, bootstrapping, splines and bias-variance tradeoff will be emphasized. Students should be familiar with statistical inference as well as regression analysis. The course will use the R programming language.
PHP 2560 Statistical Programming with R
  Statistical computing is an essential part of analysis. Statisticians need not only be able to run existing computer software but understand how that software functions. Students will learn fundamental concepts – Data Management, Data types, Data cleaning and manipulation, databases, graphics, functions, loops, simulation and Markov Chain Monte Carlo through working with various statistical analysis. Students will learn to write code in an organized fashion with comments. This course will be taught using both R and Julia languages in a flipped format.
PHP 2561 Methods in Informatics and Data Science for Health
  This course will teach informatics and data science skills needed for research in public health and biomedicine. Particular emphasis will be given to formalisms and algorithms used within the context of biomedical research and health care, including those used in biomolecular sequence analysis, electronic health records, clinical decision support, and public health surveillance. General programming language skills will be taught (in Julia) within these contexts. Mastery of informatics and data science skills will be assessed by a final project done within a health or biomedical context.
PHP 2570 Health Data Science
  This course is designed to introduce students to the practice of data science in health related fields via presentation and in-depth discussion of case studies of current or recently completed projects. The case studies will be selected to highlight important areas of research and health policy analysis. It is intended for students with advanced training in data science methods and computing at the level of courses offered in the Biostatistics Masters Program.
PHP 2580 Statistical Inference II
  This sequence of two courses provides a comprehensive introduction to the theory of modern inference. PHP 2580 covers such topics as non-parametric statistics, quasi-likelihood, resampling techniques, statistical learning, and methods for high-dimensional Bioinformatics data.
PHP 2601 Linear Models
  This course will focus on the theory and applications of linear models for continuous responses. Linear models deal with continuously distributed outcomes and assume that the outcomes are linear combinations of observed predictor variables and unknown parameters, to which independently distributed errors are added. Topics include matrix algebra, multivariate normal theory, estimation and inference for linear models, and model diagnostics.
PHP 2602 Analysis of Lifetime Data
  Comprehensive overview of methods for inference from censored event time data, with emphasis on nonparametric and semiparametric approaches. Topics include nonparametric hazard estimation, semiparametric proportional hazards models, frailty models, multiple event processes, with application to biomedical and public health data. Computational approaches using statistical software are emphasized.
PHP 2605 Generalized Linear Models
  This course will focus on the theory and application of generalized linear models (GLM), a unified statistical framework for regression analyses. Specifically, we will focus on using GLMs to model the categorical outcomes. The GLM for categorical outcomes include logistic regression, proportional odds model, and Poisson regression. Maximum likelihood estimation and inference will be introduced in the GLM context.
PHP 2610 Causal Inference & Missing Data
  Systematic overview of modern statistical methods for handling incomplete data and for drawing causal inferences from "broken experiments" and observational studies. Topics include modeling approaches, propensity score adjustment, instrumental variables, inverse weighting methods and sensitivity analysis. Case studies used throughout to illustrate ideas and concepts.
PHP 2620 Statistical Methods in Bioinformatics
  Introduction to statistical concepts and methods used in selected areas of bioinformatics. Organized in three modules, covering statistical methodology for: (a) analysis of microarray data, with emphasis on application in gene expression experiments, (b) proteomics studies, (c) analysis of biological sequences. Brief review and succinct discussion of biological subject matter will be provided for each area.
PHP 2650 Statistical Learning/Big Data
  This course introduces modern statistical tools to analyze big data, including three interconnected components: computing tools, statistical machine learning, and scalable algorithms. It introduces the principal techniques: extract and organize data from complex sources, explore patterns, frame statistical problems, build computational algorithms, and disseminate reproducible research. Topics include web data extraction, database management, exploratory data analysis, dimension reduction, convex optimization algorithms, high-dimensional linear/nonlinear models, tree/ensemble methods, and predictive modeling. These techniques are illustrated using big data examples from many scientific disciplines.
DATA 2020 Probability, Statistics & Machine Learning
  This course is provided for the Data Science Initiative: Includes topics in statistical learning including regression, classification, model selection, and causal inference.


All course offerings are subject to change. Consult Banner for the most up-to-date schedule. The University Bulletin also contains a comprehensive list of all Public Health courses.

In response to the National Institutes of Health (NIH) notice NOT-OD-13-093 and the Brown University School of Public Health mandate regarding the use of Individual Development Plans (IDP), all students in the Department of Biostatistics, regardless of funding sources, are required to complete and submit, in consultation with their advisor, and IDP. Specifically:

  • Incoming, matriculating students must complete an IDP, in consultation with their advisor, by the beginning of their second semester.  
  • All students must submit an updated IDP, in consultation with their advisor, on an annual basis.  

The IDP is a valuable tool that gives students the opportunity to consider and address their short-term and long-term career goals.  In order to achieve compliance with the IDP policy, please fill out the Individual Development Plan for Biostatistics, discuss with your advisor, and submit your completed form.  

Note: New students will be provided their login credentials following orientation.