When Lab-Trained AI Meets Real World, Mistakes Can Happen
"We train AIs to tell 'A' versus 'B' in a very clean, artificial environment, but, in real life, the AI will see a variety of materials that it hasn't trained on. When it does, mistakes can happen," said corresponding author Dr. Jeffery Goldstein, director of perinatal pathology and an assistant professor of perinatal pathology and autopsy at Northwestern University Feinberg School of Medicine. The findings were published in the journal Modern Pathology.
"Our findings serve as a reminder that AI that works incredibly well in the lab may fall on its face in the real world. Patients should continue to expect that a human expert is the final decider on diagnoses made on biopsies and other tissue samples. Pathologists fear -- and AI companies hope -- that the computers are coming for our jobs. Not yet."
In the new study, scientists trained three AI models to scan microscope slides of placenta tissue to (1) detect blood vessel damage; (2) estimate gestational age; and (3) classify macroscopic lesions. They trained a fourth AI model to detect prostate cancer in tissues collected from needle biopsies. When the models were ready, the scientists exposed each one to small portions of contaminant tissue (e.g. bladder, blood, etc.) that were randomly sampled from other slides. Finally, they tested the AIs' reactions.
Each of the four AI models paid too much attention to the tissue contamination, which resulted in errors when diagnosing or detecting vessel damage, gestational age, lesions and prostate cancer, the study found.
It marks the first study to examine how tissue contamination affects machine-learning models.
Tissue contamination is a well-known problem for pathologists, but it often comes as a surprise to non-pathologist researchers or doctors, the study points out. A pathologist examining 80 to 100 slides per day can expect to see two to three with contaminants, but they've been trained to ignore them.
When humans examine tissue on slides, they can only look at a limited field within the microscope, then move to a new field and so on. After examining the entire sample, they combine all the information they've gathered to make a diagnosis. An AI model performs in the same way, but the study found AI was easily misled by contaminants.
"The AI model has to decide which pieces to pay attention to and which ones not to, and that's zero sum," Goldstein said. "If it's paying attention to tissue contaminants, then it's paying less attention to the tissue from the patient that is being examined. For a human, we'd call it a distraction, like a bright, shiny object."
The AI models gave a high level of attention to contaminants, indicating an inability to encode biological impurities. Practitioners should work to quantify and improve upon this problem, the study authors said.
Previous AI scientists in pathology have studied different kinds of image artifacts, such as blurriness, debris on the slide, folds or bubbles, but this is the first time they've examined tissue contamination.
Perinatal pathologists, such as Goldstein, are incredibly rare. In fact, there are only 50 to 100 in the entire U.S., mostly located in big academic centers, Goldstein said. This means only 5% of placentas in the U.S. are examined by human experts. Worldwide, that number is even lower. Embedding this type of expertise into AI models can help pathologists across the country do their jobs better and faster, Goldstein said.
"I'm actually very excited about how well we were able to build the models and how well they performed before we deliberately broke them for the study," Goldstein said. "Our results make me confident that AI evaluations of placenta are doable. We ran into a real-world problem, but hitting that speedbump means we're on the road to better integrating the use of machine learning in pathology."
4155/v