Data Science Institute

Data Science Research @ Brown

Understanding the extremely variable, complex shape and venation characters of angiosperm leaves is one of the most challenging problems in botany.

Cancer is the second leading cause of death in the United States. Diagnosis and prognosis are typically determined by histological analysis of tissue samples by a pathologist, which is time-consuming and costly and suffers from diagnostic inconsistency.

The goal of this project is to design and test mathematically well-founded algorithmic and statistical techniques for analyzing large-scale, heterogeneous and noisy data. 

Existing technology cannot directly measure the synaptic connectivity between individual brain cells in an awake, behaving mammals. 

We study the problem of measuring group differences in choices when the dimensionality of the choice set is large.

Impulsivity is a substantial risk factor for aberrant behaviors. This project seeks to understand the fundamental cognitive neuroscience mechanisms underlying distinct forms of impulsivity, using a combination of theory-driven and data-driven approaches.

An encryption scheme is a method for efficiently computing an encrypted form e(X) of a given input X. It should be invertible, but computing the inverse must require a secret key.

This project is establishing a first-of-its-kind computerized platform to identify and catalog therapeutic uses of plants. 

This project aims to develop novel statistical machine learning methods for big neuroimaging data.

The Large Hadron Collider (LHC), the world's largest particle accelerator, located at the CERN lab in Geneva Switzerland, collides particles at the rate of 40MHz. 

Analysis of datasets created by linking two or more separate data sources is increasingly important as researchers and policy analysts seek to integrate administrative and clinical datasets while adapting to privacy regulations that limit access to unique identifiers. 

We study the problem of measuring group differences in choices when the dimensionality of the choice set is large.

Network reconstruction is a useful tool in a number of areas reaching from medical imaging to oil exploration.

This project develops and applies computational and information science approaches for integrating biological, clinical, and public health data for modeling complex health phenomena, with particular emphasis in pediatrics, psychiatry, emergency medicine, and critical care.

There is a large body of research in random graphs and networks, and many models of random graphs, some more and some less relevant to real-world networks.

A series of methods in genomics use multilocus genotype data to assign individuals membership in latent clusters that often correspond to geographic regions or methods of subsistence. These methods belong to a broad class of topic models, such as latent Dirichlet allocation used to analyze text corpora.

This project leverages advanced computational methods to transform social, behavioral, and familial factors from electronic health records into a rich longitudinal resource for generating knowledge regarding various determinants of health including their temporal progression, severity, and relationship to health conditions.

The international team of scientists on the MWA is pursuing a number of projects, including studies of the Milky Way and other galaxies, searches for pulsing and exploding stellar objects, and the study of space weather.

Existing and emerging genome-wide association (GWA) datasets, merged with medical record or survey data, enable testing for associations for dozens of phenotypes, yet methods for characterizing the shared genetic architecture of multiple traits are still not well-established.