Guestblog on how AI could outperform 120 pulmonologists in a PFT interpretation study.
August 11, 2020
This article was originally published as a LinkedIn Article by Nilakash Das.
Pulmonary function tests (PFTs) are a group of common medical tests that measure how well the lungs work. The tests usually refer to a combination of spirometry, body-plethysmography and diffusion capacity test. PFTs are used for evaluating respiratory symptoms, for diagnosing chronic respiratory diseases like asthma, COPD and lung fibrosis, and for monitoring response to treatment.
In clinical practice, a physician recommends a patient to perform PFTs when she presents with common respiratory problems. The tests generate a report such as the one shown in figure 2. The task of a physician is then to interpret this report to diagnose a respiratory disorder.
We can clearly observe that a PFT report is filled with many parameters, and it requires a high degree of expertise to interpret this report. In hospitals, respiratory specialists called pulmonologists interpret PFTs for disease diagnosis on a daily basis. However, this practice has a few problems:
- PFT interpretation is not standardized: It is primarily based on expert opinion, which differs from one person to person
- Limited interpretability: Guidelines by American thoracic society/European respiratory society (ATS/ERS) require only a few PFT parameters for diagnosis, and neglects most of them. This constrains the interpretability of a PFT report.
- Lack of experts: Highly trained pulmonologists are not available at all levels of healthcare.
- Redundant diagnostic tests: Tests that do not add value to the diagnostic work-up, e.g. CT, blood biochemistry etc. are often requested to confirm a diagnosis, which increases cost of care
Over the past several years, our research group at Leuven University have been extensively involved in developing an AI-based software that considers the entire PFT report and clinical characteristics (age, sex, smoking etc.) in arriving at a diagnosis. Our motivation was that such a software would add consistency and accuracy in interpreting PFT reports that will lead to an improvement in overall clinical outcomes.
Currently, the software can diagnose eight respiratory disorders that include common diseases like COPD and asthma as well as less prevalent ones like neuromuscular disease and pulmonary vascular disease. Earlier, we had reported that the software reached a diagnostic accuracy of 74% in internal validation, which was comparable to an expert panel of pulmonologists.
Nevertheless, to demonstrate the efficacy of our software, a clinical study was required. So last year, we conducted a study in which 120 pulmonologists from 16 European hospitals participated. The idea was simple: we asked the pulmonologists to identify the pattern of lung function (normal, restrictive, obstructive, and mixed) based on a set of rules by ATS/ERS, and diagnose a respiratory disorder in 50 PFT reports. Then, we ran these reports through our AI-based software and compared the two outcomes.
The pulmonologists’ pattern recognition matched the medical guidelines in 74% (±5) of the cases. We also observed a high inter-pulmonologist agreement with a kappa value of 0.67. However, when it came to disease diagnosis, the average accuracy of pulmonologists (44.6%) was significantly (p<0.0001) lower than our AI-based software (accuracy=82%). As expected, we found the agreement between pulmonologists was low (kappa=0.35), confirming a high degree of variability exists in PFT interpretation. Surprisingly, the diagnostic accuracy between senior and junior pulmonologists did not differ significantly!
It is worth exploring the reasons why such a large discrepancy in diagnostic accuracy between pulmonologists and AI was observed. Pulmonologists are primed to look at a few key parameters of the PFT report to arrive at a conclusion. They compare these parameters with cut-off values derived from the distribution of a healthy population. For example, FEV1/FVC is a key clinical parameter and a measured value less than lower limit of normal (mean-1.64 x standard deviation in a healthy population) indicates airway obstruction.
The practice of cut-off based diagnosis is the norm in clinical practice. However, this approach allows for a very limited interpretation, as real-life pathological processes are reflected by a continuum of parameter values, rather than a yes or no indicated by a threshold. Secondly, pathological processes may also be captured by parameters that are not immediately perceptible to us, but which a data-driven process may uncover.
Thus, each disease can be considered as having a unique fingerprint when all the PFT parameters are taken together. The AI model identifies subtle and defining characteristics that are challenging to detect, and incorporates them into a powerful algorithm for differential diagnosis.
It does so by mapping the input data onto a high dimensional space, and by finding the optimal hyperplanes to distinguish them. Once presented with the data of a new patient, AI maps them onto the same high dimensional space and predicts which category it belongs (Figure 4). This approach of interpreting a PFT report is completely different from human practice, and proves to be superior when it comes to diagnostic accuracy.
Now that we have shown AI is superior to individual pulmonologists, does that mean AI will replace them?
It is important to understand that AI outperforms the pulmonologists only in this narrowly defined task of PFT interpretation. In reality, pulmonologists have access to a plethora of clinical information like medical history, symptoms, lab biochemistry etc. in addition to PFT reports, which allows them to make better clinical decisions.
In addition to diagnostics, pulmonologists also decide optimal treatment and management strategies for the patient. Also, let us not forget that they also provide an emotional touch to patient care. Under no circumstances, can AI ever replace the human physician, who generalizes in a diverse range of tasks and responsibilities
So where does that leave our software?
It can act as a powerful decision support tool for clinicians. By providing consistent and accurate interpretation of PFT, it reduces the pulmonologists’ chances of misdiagnosis. It also leads to a lesser reliance on additional and redundant diagnostic tests.
In lung function laboratories where a medical resident often has to provide hundreds of preliminary interpretation a day, the AI-based software can help by sharing that workload. Finally, in primary care settings where respiratory specialists are not available, our software can augment general practitioners in interpreting results of a PFT.
Thus, there exists a large potential for our software to improve the overall standards of care by empowering and augmenting our clinicians.
Disclaimer: Our laboratory’s spin-off company ArtiQ commercializes the AI-software for PFT interpretation under the name ArtiQ.PFT. I collaborate with ArtiQ to valorise our research activities. The complete study can be accessed here.