In-depth: Landmark moment as AI beats humans in reading X-rays for TB
In 1997, the then reigning world chess champion Garry Kasparov lost a six-game series to a chess-playing computer programme called Deep Blue. Deep Blue’s victory was seen as emblematic of the growing power of computers to do things once thought primarily the domain of human brains.
Another area until recently thought the exclusive domain of human experts is the reading of X-ray images to determine whether someone has tuberculosis. Doctors have been interpreting X-ray images since soon after its discovery in 1895, although its use to detect TB only really took off in the 1930s.
But according to some, computers are now better than humans at this as well. The Stop TB Partnership (a unique UN-linked partnership) recently proclaimed that “the results are in: artificial intelligence outperforms humans at reading chest X-rays for signs of tuberculosis”.
They were responding to a landmark study (led by them) published in the journal Lancet Digital Health. Though the exact details of the study are quite technical, it is fair to say that as with chess, interpreting chest X-rays for TB is no longer exclusively the domain of clever, well-trained humans.
This development may have major implications for TB control efforts, since it may set the stage for more people being diagnosed early when they have not yet developed symptoms.
How it works
Zhi Zhen Qin, technical officer at the Stop TB Partnership, explains AI is a broad term that simply refers to machines demonstrating human-like intelligence. Most of the AI that is currently used is called ‘narrow’ AI, meaning it is a computer programme trained to perform one particular task. In this case, that task is detecting signs of TB on chest X-rays.
The type of AI used for TB detection can more precisely be described as a form of machine learning. The computer programme is trained by showing it a series of chest X-rays, some of which contain TB and some not (the technical term for this is supervised learning). In the process, Qin explains, “it learns what signs to look for in an X-ray image that is correlated with TB presence”.
When faced with a new X-ray image, the programme then uses what it has learnt to make an estimation of the likelihood that a TB-related abnormality is present on a specific X-ray image. Such a probability can then be expressed as a number and/or using a heat map or other type of visual representation. Qin points out that, while such likelihood estimations can be related to the severity of TB disease, these systems are not in fact measuring severity.
Putting it a bit differently, Professor Keertan Dheda, general physician, pulmonologist, and a critical care specialist who heads up the Division of Pulmonology at the University of Cape Town, says such computer-assisted systems produce a textured probabilistic heat map with a score outlining the likelihood of TB being present.
“For example, if the computer-assisted system detects shadowing in the upper parts of the lung or detects cavities (holes in the lungs), it is more likely to report probable TB. It can only infer the likelihood of TB and not definitively prove it. Thus, sputum-based testing and confirmation of the diagnosis are still required. Indeed, drug-resistant TB cannot be diagnosed using the X‑ray approach,” he says.
‘Better than humans’
The latest findings are not unexpected. The World Health Organization in March recommended the wider use of such computer-aided detection systems – but they will bolster the case for making these technologies more widely available more quickly and may help governments in deciding which of a number of competing systems to choose.
The Lancet Digital Health study compared the performance of five different computer-aided detection (CAD) systems with each other and with a panel of three human radiologists. All the CAD systems and the panel of radiologists evaluated the same series of X-ray images from just under 24 000 people. The images were from three treatment centres in Bangladesh and date from 2014 to 2016. People who were X-rayed were also asked about their symptoms and given the gold standard GeneXpert molecular test – although this information was, of course, withheld from the CAD systems and panel of radiologists.
“We sort of did a blind test – we used a dataset that has never been seen by any AI company,” explains Qin. She says none of the five CAD systems were retrained in Bangladesh prior to the study since they wanted to see how off-the-shelf solutions perform out in the real world.
All five of the CAD systems were found to “significantly outperform” the panel of human radiologists in the study. Two of the CAD systems (QXR and CAD4TB) also met the aspirational 90% sensitivity and 70% specificity target product profile set by the WHO for a TB triage test (sensitivity is a measure of how often a test correctly detects a condition in someone who has the condition. Specificity is a measure of how often a test correctly gives a negative result in someone who does not have the condition being tested for).
The study findings are significantly more nuanced than what we can capture here – we strongly recommend looking at the publication itself.
— Global TB Coalition (@G_C_T_A) October 14, 2021
Will findings apply in SA?
Delft Imaging Partnership Development Director, Ayumi Gosho says their CAD4TB product (one of those evaluated in the Bangladesh study) was trained on over one million X-ray images from numerous countries and continents and does not need to be retrained for the South African context. But that does not necessarily mean there is no need to evaluate the performance of CAD4TB and other systems in the country.
According to Qin, the Stop TB Partnership is collaborating with the South African Medical Research Council (SAMRC) to conduct an evaluation study in South Africa similar to that conducted in Bangladesh – given that the patient population is very different here, due to, for example, high HIV rates. Results from this evaluation are expected later this year or early next year.
“A major problem is that this approach is not a one-size-fits-all and does not work as well in certain sub-groups of patients,” says Dheda. “For example, performance may be sub-optimal in HIV-infected patients and those who are smear-negative (i.e. have a low (TB) bacterial load in the sputum). Other groups where this approach may not be as good would include elderly patients and those with immunosuppressive conditions. However, in other sub-groups, they seem to perform better.”
Dheda points out that another drawback of CAD systems is that, unlike human readers, they may not be as good at picking up other conditions such as cancers and smoking-related lung disease. He says that the main utility of these systems is for mass screening.
In the Lancet Digital Health article, CAD systems were also found not to be particularly good at distinguishing current TB from previous TB. TB scarring can sometimes still be seen on X-rays after someone has been cured. Given South Africa’s high TB rates, this is an obvious concern.
“For programmes this is likely to result in people with a history of TB, but without active TB, being flagged for further diagnostic testing – which might cause more recall than in people without prior TB history,” says Qin. “However – humans can’t do this either – and the bottom line is that someone or something has to read the chest X-ray image – and now AI is better doing it (and likely cheaper) than humans.”
Enabling earlier diagnosis
But even with these caveats, the potential benefits of mass X-ray screening to South Africa’s fight against TB remain very tantalising. While estimates vary, an often-quoted estimate is that one person with TB transmits TB to around 15 others if not treated early. Once on treatment, someone with TB becomes non-infectious within a few weeks. When you add to this that around 150 000 of the estimated 390 000 people thought to develop TB in South Africa every year are never diagnosed, the benefit of diagnosing more people and starting them on treatment quicker seems clear.
The main advantage of X-ray screening is that it can help diagnose TB in people who do not have the typical signs and symptoms of TB, especially cough, says Professor Martie van der Walt, director of the TB Platform at the SAMRC. “Our diagnostic procedures are very good for diagnosing TB in people with a cough and other signs and symptoms, but those that do not have a cough are those patients with a delay in diagnoses, which continue to spread the disease and who become sicker and sicker,” she explains.
Of the 234 people out of around 35 000 found to have TB in South Africa’s first National TB Prevalence Survey (findings of which were made public in February), more than half, around 58%, had abnormal X-rays without any TB symptoms, 35% had both abnormal X-rays and TB symptoms, and around 7% had symptoms only. Around 9 000 people had to be screened to find the 135 cases of active TB where people had no symptoms but had X-rays suggestive of TB.
Dr Francesca Conradie from the Clinical HIV Research Unit at the University of the Witwatersrand, says the findings from the TB Prevalence Survey show that there is a place for chest X-rays in finding the so-called missing TB cases. “But the volume of chest X-rays that is needed to find cases is huge and I think that AI will enable us to get it done,” she says. “If we had quality readings of chest X-rays via AI, we could screen more people. And we could consider redirecting some of our human resources to more patient-facing services including adherence support, monitoring of adverse events, and contact tracing, to mention a few.”
Faster than humans
Making use of CAD systems can speed things up and increase volumes in multiple ways. These systems can interpret an X-ray image in seconds, while it typically takes human radiologists at least five minutes to read an image (up to between 80 and 100 a day, according to van der Walt). CAD systems also don’t get tired and theoretically have near-infinite capacity, while radiologists are in short supply in most healthcare systems.
“The main bottleneck in terms of human readers is throughput and cost. AI systems can quickly process hundreds of X‑rays so that test results are quickly available and at low cost,” says Dheda.
“Results from AI are fast, a matter of seconds, largely depending on internet speed,” says Qin. “Some of the AI software can be installed locally and work without internet, then it is instantaneous. Usually, the rate limitation step for using AI to read CXR is how fast can an X-ray film be taken in a day.”
Another concern with humans is that different people may interpret the same X-ray differently.
Variation is well documented between different human readers, Qin says. “This variation can be a result of the level of training received, exhaustion, or just human error. Even, the same radiologist looking at the same X-ray at a different time may have a different interpretation as well (intra-variability),” she says explaining that AI, in comparison, is not exhaustible, performs to a consistent standard, and can be taken wherever a screening programme needs to go.
Taking X-rays to rural areas
“With the AI tools, it is not needed to have a radiologist or medical doctor available, and it can be installed to work with a mobile chest X-ray unit. This way X-ray services are not restricted to clinics or hospitals, but can be offered to a community or be available at the smallest clinic,” says van der Walt, adding that this makes AI-enabled X-ray diagnosis of TB very cheap and cost-effective. A number of mobile X-ray pilot projects are already underway in South Africa.
Such systems may be particularly valuable in understaffed and under-resourced rural areas where they could help reduce the load on healthcare workers.
Qin says many high TB-burden countries do not have easy access to qualified and well-trained human readers, and where human readers are available, these may be concentrated in urban areas and people may be unable to or unwilling to travel to screen rural or hard to reach populations where TB is also prevalent. “In settings without trained radiologists, AI could be the difference between being able to provide chest X-ray screening for TB, or not,” she says.
“We do not find physicians or radiologists in all (primary) health care facilities,” says Dr Norbert Ndjeka, Director of HIV, TB, and Drug-resistant TB at the national Department of Health. He adds that with AI, one can gain speed and address the shortages of healthcare professionals and expertise.
Even once these technologies are available in clinics, TB diagnosis will remain significantly more complicated than just a single test result. When a person is diagnosed by such a CAD system with TB, van der Walt says, the person will still have to undergo a full clinical examination, like symptomatic patients, before treatment can be started. A suggestive X-ray would thus still have to be followed up with a GeneXpert molecular test.
As Spotlight previously reported, the more widespread use of X-rays for TB detection, especially using newer mobile X-ray units, may, however, be delayed by regulatory red tape.
In addition to regulatory issues with the actual X-ray machines, it is also not clear whether CAD systems would also have to be registered with the South African Health Products Regulatory Authority (Sahpra). When asked about this, Sahpra spokesperson Yuven Gounden simply told Spotlight that Sahpra hasn’t started doing product registration as yet for medical devices (suggesting that the software will be regulated as a medical device). He says all products are listed when applicants apply for establishment licenses (Spotlight previously explored how such establishment licenses work in South Africa).