‘Datathon’ convenes high schoolers, scientists, clinicians to solve health care challenges

At a two-day datathon at Brown, local high school students teamed with computer scientists, health and medical professionals and other mentors to dig into data, unearth health inequities and find solutions.

PROVIDENCE, R.I. [Brown University] — Brown University’s Sayles Hall had been transformed into a computer lab: At round tables, teams huddled around monitors, scrolling through screens of numbers and graphs. Groups consisted of data scientists, clinicians, teachers and local high school students along with Brown master’s students and future physicians from its Warren Alpert Medical School.

The assignment was simple: Spot the biases hiding in the health data and create more equitable prediction models. But high school student Justin Hernandez described the organizers’ intentions in more ambitious terms:

“What they’re trying to do here is change the world,” said Hernandez, a sophomore at the Metropolitan Regional Career and Technical Center in Providence.

Ambitious, yes, but also on target, according to Brown faculty members Dr. Hamish Fraser and Dr. Jeremy Warner, hosts of the Brown University Health AI Systems Thinking for Equity Datathon. By creating collaborations among high school students, data scientists and clinicians, the datathon was intended to demonstrate that society needs people from diverse backgrounds to address bias in AI predictions, contribute to a more equitable health care system and mitigate future problems.

“Our goal is to educate people about bias in machine learning predictions and how that can impact every facet of our medical lives,” said Warner, a medicine and biostatistics professor.

To organize the event, Fraser and Warner collaborated with Kathryn Jessen Eller of the East Bay Educational Collaborative, one partner behind Data Science, AI and You in Health Care, a semester-long course that introduces local high school students to bias in machine learning (and how it can impact health care outcomes), critical data science, machine learning tools and skills to succeed in a data- and AI-oriented world.

The two-day datathon in early June was a culminating event for the 12 schools participating in the course this year, said Eller, who received funding for the program from the National Science Foundation and worked with science educators to create the curriculum for students with no prior statistics or coding knowledge.

Beyond expanding the students’ knowledge throughout the semester, the organizers also hoped that encouraging them to work with professionals at the datathon would increase their awareness of the role of data science in health care, and eventually contribute to broadening participation in the STEM workforce.

“During the datathon, students can see for themselves how truly dynamic data science can be, and experience how exciting and cutting edge this work is, how it’s associated with AI and machine learning, and how understanding data science in health care can improve diversity and create a better future for everyone,” Eller said. She added that after last year’s event, some students said that their experience inspired them to explore careers in data science and health care.

Digging into the data

The datathon at Brown was based on a model developed by Dr. Leo Celi, a scientist at Massachusetts Institute of Technology who, through the MIT Critical Data consortium, has organized over 50 global datathons built to encourage a collective of data scientists and medical experts to cooperatively solve health care problems. Celi developed Health AI Systems Thinking for Equity as a forum to analyze, discuss and mitigate unintentional bias within the data used during machine learning and generative AI to make health care decisions.

Gesturing around the buzzing hall at Brown, he praised the “hive learning strategy” of research: “We need everyone’s help, because the only way to regulate technology systems is to understand them,” Celi said. “In a datathon, everyone is a teacher, and everyone is a learner.”

Each team was working from a data set comprised of de-identified information on over 130,000 ICU stays which had been manipulated to exacerbate existing racial and ethnic biases. The groups were to investigate the effect of a faulty pulse oximeter reading, the effect of a missing blood lactate level and the effect of the combination of the two on mortality prediction in the hospital. Their challenge was to create a model that predicted mortality — while taking into account how the biases would influence the prediction model.

Jorge Reyes, a 10th grader at the Met School who is interested in a career in business, was familiar with data sets from the Data Science, AI and You in Health Care course, which he’d taken at school. However, the data set in the challenge went into much greater detail, he said, and was impressively comprehensive.

“It’s interesting to see how much data you can put in to create new graphs,” Reyes said.

Other students echoed that sentiment: East Greenwich High School ninth graders Mukti Patel and Mathew Claeson were both interested in seeing what they could do with their challenge, and how they could explore the different variables.

“I like the act of digging into the data, and I’m also interested in learning more about the power of AI and how it helps with coding,” Claeson said. “Through this event, I’m hoping to gain more knowledge about AI.”

Reyes, who said he’d been looking forward to meeting professional mentors at the event, had been assigned to the same table as Gabrielle Masse, a regulatory coordinator at the Lifespan Cancer Institute. Masse pointed out to her group that their data set included information about patients on ventilators. The group decided to include this factor in their prediction model — the idea being that patients on ventilators may have higher mortality risks.

The group’s ventilator graph was evidence that the datathon was working, said Dr. Sandeep Jain, a hematology/oncology fellow with the Warren Alpert Medical School who is affiliated with the Brown/Lifespan Center for Clinical Cancer Informatics and Data Science.

“This team is looking at different aspects of the data that we, as organizers, didn’t even think about,” Jain said. “They’re making discoveries on the spot. I got so excited when I heard them talking about that!”

This was the second year of the Health AI Systems Thinking for Equity Datathon; last year’s event took place at MIT. But commuting to Cambridge in rush hour traffic had proven challenging for the Rhode Island students. Brown was more centrally located and offered willing volunteers from the Warren Alpert Medical School and its computer science graduate programs, who joined mentors from Brown, MIT and other universities around the world.

“I really want to celebrate that Jeremy, Hamish and Sandeep, as well as all the participating Brown mentors, stepped in and saved the day for this program by holding the datathon in Providence, which is much more convenient for the students,” Eller said.

The datathon planning committee, with members from Brown University, MIT Critical Data, the East Bay Educational Collaborative and Brown's Health and Biomedical Library Services, worked together across disciplines in the spirit of hive learning: members learned from each other and discussed options from different points of view. The high school students’ participation was supported and praised by Rhode Island leaders, including Angélica Infante-Green, Rhode Island commissioner of elementary and secondary education, who spoke on the first morning of the datathon, and U.S. Congressman Gabe Amo, who delivered closing remarks on day two.

The experience was well worth the short trip from South Providence, said Doug Rademacher, who teaches the data science and AI in health care course at the Met School.

“I really value my students seeing professionals they can relate to, and people from a variety of disciplines working together to solve difficult problems,” Rademacher said. “I also think it’s great for them to hear a doctor or a professor saying ‘I don’t know’ and encouraging the student to share their own ideas.”