Researchers from Stanford University and Harvard Medical School have developed an artificial intelligence-based diagnostic tool that can identify disorders on chest X-rays based on the natural language descriptions supplied in the accompanying clinical reports. The phase is seen as a significant development in clinical AI design because the majority of existing AI models require laborious human annotation of vast volumes of data before the labeled data are provided into the model to train it. According to an article presenting their findings that was published in Nature Biomedical Engineering, the model, called CheXzero, performed on par with human radiologists in its capacity to recognize diseases on chest X-rays. The model’s source code is also freely available to other academics thanks to the group.
Most AI systems require labeled datasets for their “training” to accurately detect diseases. This process is particularly challenging for tasks involving the interpretation of medical pictures since it necessitates significant, frequently expensive, and time-consuming annotation by human clinicians. For example, skilled radiologists would have to examine hundreds of thousands of X-ray images one at a time and individually annotate each one with the conditions found in order to label a chest X-ray collection. Even while more recent AI models have made an effort to overcome this labeling bottleneck by learning from unlabeled data during a “pre-training” stage, they ultimately need fine-tuning on labeled data to attain high performance. The new model, however, is self-supervised in that it learns more independently without requiring manually labeled data either before or after training. The only data used in the model are the English-language notes found in the reports that go along with the chest X-rays.
According to research principal author Pranav Rajpurkar, assistant professor of biomedical informatics in the Blavatnik Institute at HMS, “we’re living in the early days of the next-generation medical AI models that are able to do flexible tasks by directly learning from text.” To attain high performance, the majority of AI models have up to this point relied on manually annotating enormous volumes of data, to the tune of 100,000 images. No such disease-specific annotations are required by our strategy. “With CheXzero, one can simply feed the model a chest X-ray and associated radiology report, and it will learn that the image and the wording in the report should be considered as similar—in other words, it learns to match chest X-rays with their associated report,” Rajpurkar continued. Eventually, the model can figure out which ideas in the unstructured text correspond to which visual patterns in the image.
The program was “trained” using a publicly accessible dataset that included more than 227,000 clinical notes and 377,000 chest X-rays. Following that, two distinct datasets of chest X-rays and associated notes from two different institutions, one of which was in another nation, were used to test the system’s performance. With such a variety of datasets, it was hoped that the model would perform just as well when tested against clinical notes that might employ several vocabularies to describe the same observation. During testing, CheXzero was able to recognize diseases that human clinicians had not expressly marked. It fared better than other self-supervised AI technologies and had accuracy comparable to that of radiologists.
The method, according to the researchers, might potentially be used with imaging modalities including CT scans, MRIs, and echocardiograms that go well beyond X-rays. CheXzero, according to Ekin Tiu, a Stanford undergraduate student and visiting researcher at HMS, “shows that accuracy of complicated medical image interpretation needs no longer be at the mercy of enormous labeled datasets.” We use chest X-rays as a motivating example, but CheXzero’s capability is generalizable to a wide range of medical settings where unstructured data is the norm, and it precisely embodies the promise of avoiding the large-scale labeling bottleneck that has dogged the field of medical machine learning.