The National Academies

10/14/2024 | News release | Distributed by Public on 10/14/2024 09:11

Workshop Explores the ‘Opportunity and Perils’ of Using AI in Medical Diagnosis

Share

Workshop Explores the 'Opportunity and Perils' of Using AI in Medical Diagnosis

Feature Story| October 14, 2024
Most people will experience at least one diagnostic error in their lifetime, sometimes with devastating consequences, the 2015 National Academies report Improving Diagnosis in Health Care found. Diagnostic errors are a contributing factor in approximately 10 percent of patient deaths.
A recent workshop hosted by the National Academies explored some of the potential benefits and risks involved in using artificial intelligence and other digital tools to improve medical diagnoses.
Daniel Yang, vice president of AI and emerging technologies at Kaiser Permanente and chair of the workshop planning committee, noted that his current role and his prior work in philanthropy have given him a front-row seat to witness "both the incredible opportunity and the perils of applying new technology to diagnosis."
"These days it's so easy to be entranced by the bright and shiny object that is AI," said Yang. "The antidote to that is to maintain a posture of skepticism and to remain laser-focused not on the tools, but on the problem you're solving for. The problem we're solving for today is one of the most important issues in health care … Diagnostic errors are both the most common and consequential medical errors experienced by patients in the U.S."

Symptoms and searches

The diagnostic process starts before a patient enters a doctor's office, multiple speakers noted. It begins when a person first experiences symptoms, which often prompts an online hunt for answers.
"There are a billion health-related queries [on the internet] every day … and many of those relate to symptoms," said John Whyte, chief medical officer at WebMD. "You have a fever, you have chills, you have a cough, you're concerned about what that might mean, so you're searching."
Lucia Savage, chief privacy and regulatory officer at Omada Health, explained some of the pitfalls patients need to navigate online, including a lack of privacy protections - the nation's health care privacy regulations do not apply to internet searches and chatbot queries - and widespread scientific misinformation. "That's the environment in which the patient or the care partner might be using the internet and may or may not know that what they're reading is good or bad information," she said.
Patient advocate Grace Cordovano of Enlightening Results urged participants and clinicians not to dismiss patients' online inquiries. "Patient communities are deeply afraid of the stigma of being associated with saying, 'I used ChatGPT, and here's what I came up with,'" she said. "I see that in my daily advocacy work … there are some physicians who immediately glaze over, and I will get a scoff and an eyeroll."
But digital tools and queries can help patients navigate the health care system, understand symptoms, and connect with the right type of specialists, said Cordovano. Online searches for answers are also an example of patient engagement - something often seen as a "holy grail" in the health care community. "Having a patient come into point of care, or leave a message on the portal, or speak to the nurse navigator, or reach out to a community health worker, and say, 'I think I have this' - that needs to be already recognized as patient engagement," said Cordovano.

Applying image recognition

Michael Howell, chief clinical officer at Google, spoke about the history of AI's applications in diagnosis. Swift progress in image recognition - AI's ability to identify and classify objects in images - started around 2011, driven by the development of deep learning and neural networks.
These advances were quickly applied in medicine, Howell said. In 2016, a research paper described a deep-learning algorithm that could identify diabetic retinopathy in images of retinas. The paper was tagged by JAMA as one of the 10 most influential papers of the decade.
"We've subsequently seen work not just in diabetic retinopathy but in lung cancer screening, more complicated eye diseases, pathology, variant calling in genetics, breast cancer, and on and on," he added. "There are thousands of papers like this now."
Radiologist Jason Poff explained how these advances are being applied in the field of radiology. The organization for which he works, Radiology Partners - which provides more than 10% of all radiology services in the U.S. - is processing tens of millions of patients' imaging exams with AI every year, and thousands of radiologists are using these technologies, said Poff. "We're right on the frontier of these tools."
Poff offered an example of how a patient came to the emergency department with left-side chest pain. AI detected a rib fracture that may have come from a fall the patient had forgotten to tell the doctor about, and which explained the pain. "It turns out this diagnosis is quite difficult for radiologists to make, particularly when they're not given the right story," said Poff. He noted another example where AI spotted a small brain aneurysm in an atypical location that had been missed by a radiologist.
But AI needs human oversight because it can also make errors, said Poff, citing an instance where AI mistook a bit of mucus for a blood clot in a patient's lung - an example of a false positive. In another case, a false negative, AI failed to spot a patient's brain hemorrhage.
Poff said that he and his colleagues delve into the ways AI tools can fail before they deploy. "You have to teach your radiologists, your physicians, about all the ways it can lead you down the wrong path, so they can avoid those pitfalls," he said. "We really need our humans to overrule the AI when it's wrong."

The emergence of generative AI

While AI that depends on deep learning and image recognition can only be trained to do one specific task - such as screening for lung cancer - generative AI, including large language models (LLMs), is more flexible and can respond to varied queries, Howell explained.
Jonathan Chen, a biomedical informatics researcher at Stanford University, described his research on LLMs' capabilities, including a recent experiment that tested the performance of OpenAI's GPT-4 on an exam that assesses open-ended medical reasoning - a test Stanford uses to decide whether medical students are ready to see patients. The exam presents complex cases, including both relevant and irrelevant information just like a real patient history, and then asks: Can you summarize this case, give a differential diagnosis, and justify your reasoning? GPT-4 outscored the average Stanford medical student by a couple of points, Chen said.
He cautioned that AI chatbots cannot replace doctors - "We provide very different specific and unique value than a computer ever will." The relevant question is how physicians can learn to effectively use and work with generative AI, which may require further education and training, Chen said.
Multiple attendees asked about the possibility of LLMs delivering inaccurate information, since they can hallucinate, and since the user doesn't know the source of the information it's drawing from.
"There's very real, credible risk," said Chen. "These things can generate believable misinformation - no intention of harm, just confabulation [or] hallucinations start happening. That is very dangerous because of how believable the misinformation is."
One possible solution is retrieval-augmented generation, Chen noted - an approach that enables an LLM to draw upon specific, trusted sources of information. A physician could tell a chatbot: "Read this document, read this patient's chart - what does this say on that? And make it traceable so I can go back and trust and verify," he said. "It still needs to be refined and perfected, but that's also clearly the direction that these are going, and already are starting to be used."
"At Children's Hospital of Philadelphia, we're doing a lot of work with large language models and training them on our specific pediatric data," said Kenrick Cato, a professor of informatics at Children's Hospital. He added that he and his colleagues are also careful not to stretch LLMs beyond their current limits; for example, they are using LLMs on some administrative tasks, where error rates are low, but are "waiting until the science catches up" before using them on high-stakes clinical issues, an approach he recommended.

Mitigating bias and advancing health equity

The workshop also explored another challenge with using AI - racial and ethnic biases in data and algorithms that could perpetuate health inequities - as well as opportunities for digital tools that could help reduce inequities.
Michael Cary, Elizabeth C. Clipp Term Chair of Nursing at the Duke University School of Nursing,noted technical strategiesat the algorithm level that can mitigate bias - using greater or more diverse training data, for example, or using new methodologies and strategies to adjust algorithms. It's also important to look beyond technical solutions to approaches at the health system level - such as using inclusive data processes and auditing algorithms for safety and equity, he said.
"AI has potential to provide personalized and fair care, but we really need to take a hard look at the bias within these tools before we implement them and prepare to scale them," said Cary.
Kadija Ferryman, assistant professor at Johns Hopkins University, also urged participants to look beyond purely technical remedies to ethical and policy approaches. For example, if there is missing data for a racial or ethnic group, rather than simply imputing more data on that group as a remedy, institutions could look into why the data is missing in the first place: "Is it because there is lack of access by that racial group to the clinical context?" asked Ferryman. "Is because there is earned mistrust [among] that group [of] that health care institution?"
Irene Dankwa-Mullan, chief health officer at Marti Health, spoke about how AI could be leveraged to advance equitable diagnostic excellence, which she defined as "the pursuit of the highest standards in accurately identifying and understanding medical conditions, and at the same time acknowledging that there is diversity, there is variability in which similar medical conditions, including pathophysiology, affect different patients and populations."
AI and other digital tools can support this goal by enhancing the accuracy and speed of diagnosis and including patient engagement, she said. Wearable devices can enable earlier detection and prompt medical intervention, for example, and telemedicine can allow timelier consultations for patients in rural and remote areas. AI's predictive analytics can be valuable in personalizing treatment plans and managing chronic conditions.
Looking to the future, we need to ensure that all patients, regardless of their background, have access to these tools to improve their care, said Dankwa-Mullan. "We need to seize this opportunity that we have in this room to make equitable diagnostic excellence not just an aspiration but a standard for everyone."
Watch sessions from the workshop, which was organized by the National Academies' Forum on Advancing Diagnostic Excellence.

Recent News

NAS Member Is Joint Recipient of Nobel in Economics

Science Diplomacy and the Rise of Technopoles

NAS Member Receives Half of Nobel in Chemistry

Academy Members Share Nobel Prize in Physics

  1. Load More...