Using ChatGPT To Generate Differential Diagnoses

Large language model AI tools, such as ChatGPT, garnered the attention of radiologists this year. While the technology is still developing, a study presented Tuesday found promising improvement in ChatGPT's ability to generate differential diagnoses in abdominal radiology.

Sun

Shawn Sun, MD, a radiology resident at UCI School of Medicine, Irvine, CA, presented a project that used 70 case studies specific to gastrointestinal and genitourinary imaging that were evaluated from two textbooks.

The case images and history were converted into standardized prompts that contained purely descriptive language of the cases and a query for the most likely diagnosis, top three differential diagnoses and the corresponding explanations and references from the medical literature. These prompts were fed into the ChatGPT3.5 and ChatGPT4 algorithms. Generated responses were analyzed for accuracy by comparison with the original literature and reliability through manual verification of the generated explanations and citations.

Dr. Sun said the most impressive part of the work was how it improved from ChatGPT3.5 to ChatGPT4, which was released while the team was doing its work. In the study, the top 1 accuracy and the top 3 accuracy were defined as the percentage of generated responses that matched the original diagnosis and the complete differential provided by the original literature.

The top 1 accuracy and top 3 accuracy, for ChatGPT3.5 versus ChatGPT4 were 35.7% compared to 51.4% and 7.1% compared to 10.0%, respectively.

"This technology is going to continue to improve, and it already is improving at a lightning speed," he said.

Seeking Ways For ChatGPT To Help With Workflow

However, the technology is far from perfect. Dr. Sun said the results still included several examples of the "hallucination effect," a documented issue where ChatGPT hallucinates the content of some reference material, such as citing sources that are not real.

ChatGPT3.5 and ChatGPT4 hallucinated 38.2% versus 18.8% of the references provided and generated 23 total false statements versus four total false statements, respectively, a big improvement in a short time, according to Dr. Sun.

"It is promising but not ready for prime time, which is probably the theme of the decade for AI technologies right now," Dr. Sun said. "But the results are quite promising given the context. ChatGPT was not trained for medical diagnosis. It was trained to achieve general tasks for the public, and yet it is still able to achieve performance that almost matches the best other informed decision-making AI tools."

Dr. Sun acknowledged that while some may see AI as a threat, AI has the potential to help radiologists be more efficient and effective if used for the right kind of work.

"Tasks can be separated into those that require cognitive thinking and those that require iterative labor. Scrolling through scans and looking at each one is an example of iterative labor which machines and algorithms can help with," Dr. Sun said. "The area where we see the algorithms struggle is with context, such as combing through all of that information to come to a conclusion while accounting for a patient's history and presentation, that's where radiologists add value."

Access the presentation, "Testing the Ability of ChatGPT to Generate Differential Diagnoses from Transcribed Radiological Findings in Abdominal Radiology," (T7-SSGI11-1) on demand at Meeting.RSNA.org

Using ChatGPT To Generate Differential Diagnoses In Abdominal Radiology

Seeking Ways For ChatGPT To Help With Workflow