Stanford PRC Doctors Publish Landmark AI Paper on Accurate Biomarker Discovery

January 2, 2024
shutterstock image research lab

Scientists at the March of Dimes (MOD) Prematurity Research Center (PRC) at Stanford have created a breakthrough Machine Learning (ML) algorithm that makes reliable predictions about labor onset, preterm birth, and preeclampsia and also identifies the biological markers supporting those predictions.

The discovery represents a first in the field of ML and Artificial Intelligence (AI) and paves the way for targeted medical care for pregnant women. Not only does it signal a new era of data science and analytics in various fields, including economics and environmental science, it makes possible substantial improvements in pregnancy outcomes through the use of precision medicine and drug discoveries.

“This work has allowed us to translate the insights of AI to doctors in an interpretable manner,” said Dr. Brice Gaudillière, an associate professor of anesthesiology, perioperative and pain medicine and of pediatrics at Stanford Medicine, as well as the March of Dimes PRC investigator who led the effort. “Uncovering the exact biomarkers that lead to adverse pregnancy outcomes can guide repurposing of existing drugs and accelerate discovery of new therapeutics to prevent preterm birth and preeclampsia; it moves us closer to the realm of precision medicine, where each patient is treated based on their own biology.”

“This is an exciting development toward improving maternal and infant health,” he added.


Currently, AI/ML mathematical algorithms being developed for potential medical care can make risk predictions for conditions and provide a large number of associated biomarkers. But because these models identify candidate biomarkers from a list of all available biomarkers, there is an incredible amount of variability in results from within the same dataset. The more variability in results, the less confidence that the biomarkers chosen by the model are truly correlated with the outcome risk and informative to clinicians.

“The features that were reliably associated with the clinical outcomes were difficult to identify or pinpoint,” said Gaudillière of existing models.

Not surprisingly, traditional AI/ML models, with often confusing, conflicting, and widely variable results, aren’t convincing enough to researchers or healthcare providers to definitively identify diagnostic or therapeutic intervention. Add to this the many complex biological processes that can lead to elevated risk for adverse pregnancy outcomes, and it becomes clear why healthcare providers are often left playing catch-up when complications happen. The root causes are unknown, so targeted, effective interventions are all but impossible.

The algorithm designed by the Stanford PRC team and published recently in Nature Biotechnology addresses these issues by focusing on a more defined list of candidate biomarkers for some of the most complex and common pregnancy outcomes: risk of preeclampsia, risk of preterm birth, and timing of labor onset. From a long list of 200 biomarkers, the PRC’s algorithm narrows to a far more accurate short-list of about 20 biomarkers. This results in an unprecedented, highly focused view of the biology that leads to elevated risk.

“This means that the biomarkers selected by our algorithm are not just random or false discoveries, but have a higher likelihood of being genuinely associated with the outcomes, making them more trustworthy leads for clinical applications,” Dr. Gaudillière said.

Dr. Gaudillière’s model measures the health of the immune system during pregnancy using parameters found in the blood—immune cells, proteins, genes, RNAs, and metabolites. More formally, these parameters are referred to as omics, and include metabolomics, proteomics, transcriptomics, genomics, and epigenomics. They all provide information about the overall biology underlying the health of mom and baby.

Using this omics data, the algorithm can help doctors pinpoint women at risk of specific adverse pregnancy outcomes, help identify the biological processes most likely responsible, and eventually provide targeted diagnostics or therapeutics to predict and prevent poor outcomes.

“We built this algorithm with the goal of understanding preterm birth, preeclampsia and other complications but also to move the needle toward practical clinical applications,” Dr. Gaudillière said. “We would like to develop a test to predict preterm labor and to find a targeted drug to prevent preterm birth or preeclampsia in the first place.”

While the biomedical and computational science community has long understood the powerful potential of omics—through models and algorithms like this one—those benefits are only now being unlocked to help doctors and patients.


The paper, whose first author was Julien Hedou, a research data analyst in Dr. Gaudillière’s lab, outlines how the Stanford team unlocked the “black box” at the heart of AI/ML models.

Within the field of AI/ML, there’s a conceptual mystery surrounding how an algorithm generates predictions of a specified objective. For example, much like when your mind is wandering across many thoughts while trying to solve a problem, if someone asks you what you’re thinking about, it's hard to accurately explain. You may be making connections between areas unrelated to the actual problem, looking for common threads, mental models, or abstractions to discover a novel solution to a complicated issue. AI/ML algorithms are like that, in that there’s a set of real-world biological events that are connected to adverse pregnancy outcomes like preterm birth, preeclampsia, and birth timing. How these data come together into a risk profile or calculation is the mystery, or “black box.”

The PRC researchers opened the black box—and they found the reliable biomarkers behind the risk predictions.

“When you’re a clinician, you’re interested in these parameters,” Mr. Hedou said. “And you want to know what are the ten, twenty features that are important so you can treat patients accordingly.”

“Our hope is that now we can finally go from predictive omics models to something that is usable by doctors,” he adds.

This information will give healthcare providers unprecedented early insight into pregnancy, allowing them to narrow in on more tailored treatment. The model is also translatable to other fields using AI/ML models to find answers in mass quantities of data. And because it’s an open-source algorithm, it’s now publicly available, for free, to the greater computational community.

“This model truly represents the crux of what we strive for at our PRC,” said Stanford PRC lead investigator, neonatologist, and study co-author Dr. David Stevenson. “It’s translational science that could one day give actionable information to clinicians to help intervene before severe preeclampsia sets in, to delay an expected preterm birth or prepare for a high-risk delivery—and those interventions are just the start.”

“And while this model is specifically geared toward pregnancy and preterm birth, it’s basic open source blueprint is available to data scientists across disciplines today to use to unlock their own data mysteries.”


The algorithm’s key element is the way it handles and processes real-world, “noisy” data. Noise in data refers to the variability in how, when, where, and what data are recorded. It looks like disorganized, incomplete, and difficult to interpret data that can confuse, distort, and impersonate meaningful insights, called “signals.” The end goal of almost all AI/ML models is to find the signal within the noise (or needle in a haystack). Good examples of noise in data look like handwritten medical notes; discrepancies between different testing devices, testing times, and even personnel doing the testing; missed doctor’s appointments, which lead to inconsistent timestamps; typos or formatting errors during medical data entry; and even variability within an individual’s biomarkers, which can change on a minute to minute timescale—based on factors like stress, sleep, or, in the case of pregnancy, gestational week.

Since the features researchers measure outnumber the study patients, signals can become harder to detect; in other words, the numbers problem creates a noise problem. Because there are far more biological features (like immune cells, proteins, bacteria, etc.) tested per patient than there are patients that those samples are coming from, there is a significant imbalance of features to patients. This is commonly referred to as “the curse of dimensionality.”

“When so much information comes from so few study patients, your data is imbalanced, and you get noise…” Dr. Gaudillière said. “And the harder it is to determine whether a biomarker is really predictive of a certain outcome.”

To tackle this problem, the researchers partnered with Stanford mathematician Professor Andrea Montanari to develop the new method, called Stabl, which uses synthetic, artificial noise “injected” into the real data set. By adding into the real-world dataset artificial noise that looked—and behaved, in terms of its interaction with other features—like real noisy, or uninformative, features, the researchers were able to mirror the environment real-world data exists in. They then used a complex signal-to-noise stability threshold approach to separate the actual signals from the background noise. What emerged were the biomarkers behind certain risk predictions—in the case of this study—for preeclampsia, labor onset, and preterm birth.

“We essentially created a digital twin of the real noise in the data so we could separate the signal from the noise,” Dr. Gaudillière said. “When we first began engineering our algorithm to analyze real-world, complex data about two years ago, and saw signals clearly emerged, we knew we were onto something—so we kept going.”


By opening the black box, the team cut through that noise. After extensive benchmarking on synthetic datasets, the team tested the algorithm on five real world clinical studies. Three of these clinical studies focused on pregnancy outcomes, including two from the Stanford PRC (prediction of preeclampsia and prediction of labor onset) and one from the University of California San Francisco (UCSF) PRC (prediction of preterm birth).

For each pregnancy outcome, the algorithm selected a list of predictive biomarkers that was significantly shorter and more reliable than biomarkers selected using previous predictive modeling methods. For example, one of the studies aiming to predict the onset of labor involved a complex dataset with over 6,000 proteomic, metabolomic, and immune cell parameters. Using their algorithm, the team was able to predict the onset of labor in both term and preterm pregnancies with high accuracy using less than 20 biomarkers, compared to several hundred when using previous algorithms. Similarly, to predict preeclampsia, the model filtered out 9 predictive biomarkers from 37,000; for microbiome-driven preterm birth, the model settled on 12 candidates from a list of over 9,000.

“This testing brings us closer to the bedside,” said Dr. Stevenson. “Closer to the doctors having to make decisions about what really is important in someone’s care.”

“And for us, now that we know the source of a particular prediction, we can ask what pathway it came from and where we can manipulate it, which brings us into the exciting realm of new treatment discoveries for preterm birth and preeclampsia,” he added.

“The algorithm consistently selected fewer features compared to other methods,” said Mr. Hedou. “What’s equally important is that it achieved comparable predictive performance despite using fewer features—this trade off of maintaining predictive performance with fewer features would be very challenging for most other models. Our model was developed to reach a sweet spot between sparsity and predictive performance.”


The team’s technical next steps include improving the algorithm’s noise modeling ability so it can work with any type of data, and extending its use to non-linear biomarkers, or ones that don’t simply increase or decrease during pregnancy but have an ebb and flow, or pulse.

“A good example of a non-linear signal is the response of certain immune features during pregnancy, which ramp up over the course of pregnancy but then dampen closer to labor,” Dr. Gaudillière said.

Clinically, the group is moving to validate the model’s labor onset findings in larger and more diverse pregnant populations. For example, one of the predictive markers for labor onset is a protein receptor for the inflammatory cytokine IL-33. This receptor, known to researchers, could be a signal of labor detectable in the bloodstream. The group discovered that this was one of the strongest predictors of labor identified by the algorithm. Using mass cytometry, a technology that quickly measures thousands of cell signals in a blood sample, the team plans to check for any existing drugs that could quiet the immune response associated with the IL-33 receptor without causing any negative effects to the immune system.

They’re also looking to validate the model’s findings in new datasets from partners at universities around the world, and have initiated their own study to validate the results.

“Given the numerous ways our method can leverage big data to impact patient care, the possibilities are limitless," Dr. Gaudillière added.

Stanford scientist Dr. Ivana Maric, postdoctoral fellow Jakob Einhaus and engineering student Gregoire Bellan were co-first authors in the study. UCSF PRC lead Dr. Marina Sirota and Stanford scientists and professors Dr. Nima Aghaeepour and Dr. Martin Angst also contributed to this work.