Could Artificial Intelligence Shortcuts Misdiagnose COVID-19?

ARTIFICIAL intelligence (AI) is emerging as a promising innovative technique that may soon become an essential part of the diagnosis and treatment of disease. New evidence, however, has surfaced revealing the danger of disease misdiagnosis by clinical AI models.

Researchers from the University of Washington (UW), USA, have recently revealed that AI models have been found to rely on ‘shortcut learning’, which could lead to diagnostic errors. This evidence came to light following the assessment of multiple diagnostic models designed to detect COVID-19 from chest X-rays.

Scientists discovered that these AI models have a tendency to look for shortcuts instead of learning the information required to make an accurate diagnosis of COVID-19. Models were found to disregard clinically relevant factors, whilst drawing false conclusions from medically insignificant aspects such as patient positioning and text markers to diagnose the virus.

Alex DeGrave, co-lead author of the study, explained how physicians would use specific image patterns to diagnose COVID-19 from an x-ray, whereas AI models tend to make inferences to conclude their diagnosis. “Rather than relying on those patterns, a system using shortcut learning might, for example, judge that someone is elderly and thus infer that they are more likely to have the disease because it is more common in older patients,” DeGrave explained. Medically irrelevant associations such as this increase the risk of misdiagnosis.

Although some AI models have been publicised for their accurate diagnosis of COVID-19 through chest x-rays, researchers were sceptical about the trustworthiness of these machines. Using explainable AI, which allows understanding of how data inputs contribute to a model’s output, UW scientists speculated that these AI models were subject to ‘worst-case confounding’. This condition increases the models’ tendencies to use shortcuts, which has occurred due to the novel nature of COVID-19 and the subsequent lack of available training data.

Upon testing a series of internal and external data sets, results showed a significant decrease in the performance of the AI models when tested on external data sets. This evidence strongly supports the notion that worst-case confounding was responsible for the initial success of the models. Further testing also concluded that confounding factors remained an issue even when the datasets were obtained from similar sources.

Su-In Lee, senior author and professor at the Paul G. Allen School of Computer Science and Engineering, Washington, USA, stated: “My team and I are still optimistic about the clinical viability of AI for medical imaging. I believe we will eventually have reliable ways to prevent AI from learning shortcuts, but it’s going to take some more work to get there.” Although it is a valuable tool, this study has emphasised the importance of being clinically aware of AI, and the need to thoroughly assess models before their application to procedures such as disease detection.