GPT Can Boost Efficiency in Pancreatic Cancer Reporting

Large language models like GPT show impressive accuracy in extracting key parameters from free-text pancreatic cancer staging reports to automatically create synoptic reports, improving referring surgeon efficiency, according to research presented Sunday.

Bhayana

Structured radiology reports for the staging of pancreatic ductal adenocarcinoma (PDAC) are superior to free-text reports, but radiologist adoption varies. Reports that describe important information in blocks of free text are often difficult to read, and clinicians and surgeons can easily miss or misinterpret key findings that dictate management.

"As a radiologist for our pancreatic cancer center, I often noticed important findings communicated in reports were not received by clinicians and surgeons at initial review," said study lead author Rajesh Bhayana, MD, assistant professor in the Department of Medical Imaging at the University of Toronto.

Dr. Bhayana and colleagues studied the potential of two large language models developed by OpenAI, GPT-3.5-turbo and GPT-4, to bridge that gap by extracting key information from PDAC staging free-text reports to create synoptic reports.

The study included 180 pancreatic cancer staging CT reports of patients referred to a quaternary care pancreatic cancer center in 2018. With few-shot learning, the two GPT models were given free-text reports to create synoptic reports with 14 key parameters, including tumor location, the status of major vessels and metastases. Few-shot learning refers to providing a language model with only a few examples of a particular task. Unlike traditional language models that require tuning on many labeled examples to enable narrow use cases, large language models can perform very well on a broad range of tasks with few or even no examples.

"This makes them much more versatile," Dr. Bhayana said. "The same foundational model can adapt to many different tasks with much less steering. This is attractive in medicine—integration into electronic medical records could enable a number of helpful applications."

AI Generated Reports Improve Accessibility

For extracting key parameters into synoptic reports, GPT-4 outperformed GPT-3.5 with a nearly perfect recall of 99.6% and a precision of 99.7%.

For the more difficult task of categorizing resectability, model accuracy improved significantly with various advanced prompting techniques. GPT-4 with a chain-of-thought strategy categorized 92% of the 180 tumors accurately. Chain-of-thought involves breaking down complex multi-step problems into more manageable steps, helping the model perform better with complex problems.

In the study, hepato-pancreato-biliary surgeons asked to triage tumor resectability were more than twice as efficient reviewing AI-generated synoptic reports compared to original reports. Surgeons also preferred artificial AI-generated synoptic reports and found them easier to extract key information from. "Tools like this can help improve efficiency and quality in the clinic," Dr. Bhayana said.

Dr. Bhayana stressed that these tools should only be used in supervised settings, or for preliminary review. Surgeons should have access to both original and synoptic reports, and tumors should be reviewed by a multidisciplinary team.

The research continues, with Dr. Bhayana and colleagues looking to further fine-tune the models and apply similar principles elsewhere.

"There are a number of other areas in radiology where we can apply similar principles to help radiologists, clinicians and patients," Dr. Bhayana said.

Access the presentation, "GPT Language Models for Automated Synoptic Reporting and Determination of Resectability in Pancreatic Cancer," (S4STCE1-2) on demand at Meeting.RSNA.org..

GPT Can Boost Efficiency in Pancreatic Cancer Reporting, Help Categorize Resectability

AI Generated Reports Improve Accessibility