New framework helps radiologists improve diagnostic accuracy

New framework helps radiologists improve diagnostic accuracy

Calibrating humans and computational models is vital, especially in healthcare. Radiologists often use phrases like “maybe” or “likely” to convey uncertainty when interpreting medical images. These terms impact decisions, such as ordering tests, and directly affect diagnosis and treatment.

A new MIT study found that radiologists are overconfident with phrases like “very likely” and underconfident with terms like “possibly.” Existing methods usually depend on AI confidence scores that estimate the accuracy of predictions.

Building on this, MIT researchers, in collaboration with Harvard-affiliated clinicians, developed a framework based on real-world data that evaluates the reliability of radiologists’ certainty phrases in clinical settings.

The researchers used this framework to offer actionable suggestions, guiding radiologists to select certainty phrases that enhance the accuracy of their clinical reports. They also demonstrated that the same method could improve the calibration of large language models, ensuring their confidence expressions align better with their prediction accuracy.

This innovation could significantly improve the reliability of crucial medical information by helping radiologists more precisely describe the likelihood of pathologies in medical images.

The language radiologists use holds significant weight, directly influencing doctors’ decisions and, ultimately, patient outcomes. For example, the phrase “consistent with” is widely interpreted by radiologists as indicating a high likelihood—90% to 100%—of a pathology being present. On the other hand, terms like “may represent” convey much greater uncertainty, with probabilities spread more evenly around 50%.

Traditional calibration methods compare a model’s predicted probability scores with the actual results. The researchers built upon this idea, factoring that certainty phrases convey probability distributions instead of fixed probabilities.

To enhance calibration, they created an optimization model that adjusts the frequency of phrase usage, aligning confidence levels more closely with reality. Their work produced a calibration map, offering tailored guidance to radiologists for selecting phrases that make reports more accurate for specific pathologies.

Initial findings show that radiologists are often underconfident when diagnosing common conditions, such as atelectasis, but overconfident in more ambiguous cases, like infections.

The study also demonstrated how these methods could measure the reliability of large language models, offering a more refined approach than traditional methods that rely solely on numerical confidence scores.

This work holds promise for the future of medical imaging. By helping radiologists refine their communication, the framework could improve patient outcomes by ensuring accurate and actionable diagnostic information.

The researchers are expanding their study to include abdominal CT scans and exploring how well radiologists can adopt these calibration suggestions in practice.

Journal Reference

  1. Peiqi Wang, Barbara Lam et al. Calibrating Expressions of Certainity. arXiv:2410.04315v2

Source: Tech Explorist

Tags: