AI performance in image-based medical diagnosis is equivalent to human performance, study finds

A review conducted by UK researchers argues that, when it comes to classifying diseases using medical images, the performance of artificial intelligence (AI) models is equivalent to that of humans. The researchers initially looked for studies which were comparing the diagnostic performance of deep learning models and health-care professionals, based on medical images, for any disease, and which were published between January 2012 and June 2019. Although the initial search had over 30 000 results, only 14 studies were found relevant enough to allow accurate comparison between human and machine performance. At a close look at these 14 studies, the researchers found that deep learning systems correctly detected a disease state 87% of the time, while healthcare professionals did so 86% of the time. In addition, the doctors in the case studies were not offered additional patient information, which they would have in the real world and would be used to steer their diagnosis. The review showed not only that AI does not outperform humans in this specific field, but also that many studies claiming the opposite include poor reporting, do not present externally validated results, and do not compare the performance of deep learning models and health-care professionals using the same sample. As one of the authors noted, while deep learning can indeed be a very useful technique in health care, there is a ‘massive hype over AI in medicine [which] obscures the lamentable quality of almost all evaluation studies’.