Return To Full Article
You can republish this story for free. Click the "Copy HTML" button below. Questions? Get more details.

Health Care AI, Intended To Save Money, Turns Out To Require a Lot of Expensive Humans

Preparing cancer patients for difficult decisions is an oncologist鈥檚 job. They don鈥檛 always remember to do it, however. At the University of Pennsylvania Health System, doctors are nudged to talk about a patient鈥檚 treatment and end-of-life preferences by an artificially intelligent algorithm that predicts the chances of death.

But it鈥檚 far from being a set-it-and-forget-it tool. A routine tech checkup revealed the algorithm decayed during the covid-19 pandemic, getting 7 percentage points worse at predicting who would die, according to a 2022 study.

There were likely real-life impacts. Ravi Parikh, an Emory University oncologist who was the study鈥檚 lead author, told 杨贵妃传媒視頻 Health News the tool failed hundreds of times to prompt doctors to initiate that important discussion 鈥 possibly heading off unnecessary chemotherapy 鈥 with patients who needed it.

He believes several algorithms designed to enhance medical care weakened during the pandemic, not just the one at Penn Medicine. 鈥淢any institutions are not routinely monitoring the performance鈥 of their products, Parikh said.

Algorithm glitches are one facet of a dilemma that computer scientists and doctors have long acknowledged but that is starting to puzzle hospital executives and researchers: Artificial intelligence systems require consistent monitoring and staffing to put in place and to keep them working well.

In essence: You need people, and more machines, to make sure the new tools don鈥檛 mess up.

鈥淓verybody thinks that AI will help us with our access and capacity and improve care and so on,鈥 said Nigam Shah, chief data scientist at Stanford Health Care. 鈥淎ll of that is nice and good, but if it increases the cost of care by 20%, is that viable?鈥

Government officials worry hospitals lack the resources to put these technologies through their paces. 鈥淚 have looked far and wide,鈥 FDA Commissioner Robert Califf said at a recent agency panel on AI. 鈥淚 do not believe there鈥檚 a single health system, in the United States, that鈥檚 capable of validating an AI algorithm that鈥檚 put into place in a clinical care system.鈥

AI is already widespread in health care. Algorithms are used to predict patients鈥 risk of death or deterioration, to suggest diagnoses or triage patients, to record and summarize visits to save doctors work, and to approve insurance claims.

If tech evangelists are right, the technology will become ubiquitous 鈥 and profitable. The investment firm Bessemer Venture Partners has identified some 20 health-focused AI startups on track to make $10 million in revenue each in a year. The FDA has approved nearly a thousand artificially intelligent products.

Evaluating whether these products work is challenging. Evaluating whether they continue to work 鈥 or have developed the software equivalent of a blown gasket or leaky engine 鈥 is even trickier.

Take a recent study at Yale Medicine evaluating six 鈥渆arly warning systems,鈥 which alert clinicians when patients are likely to deteriorate rapidly. A supercomputer ran the data for several days, said Dana Edelson, a doctor at the University of Chicago and co-founder of a company that provided one algorithm for the study. The process was fruitful, showing huge differences in performance among the six products.

It鈥檚 not easy for hospitals and providers to select the best algorithms for their needs. The average doctor doesn鈥檛 have a supercomputer sitting around, and there is no Consumer Reports for AI.

鈥淲e have no standards,鈥 said Jesse Ehrenfeld, immediate past president of the American Medical Association. 鈥淭here is nothing I can point you to today that is a standard around how you evaluate, monitor, look at the performance of a model of an algorithm, AI-enabled or not, when it's deployed.鈥

Perhaps the most common AI product in doctors鈥 offices is called ambient documentation, a tech-enabled assistant that listens to and summarizes patient visits. Last year, investors at Rock Health tracked $353 million flowing into these documentation companies. But, Ehrenfeld said, 鈥淭here is no standard right now for comparing the output of these tools.鈥

And that鈥檚 a problem, when even small errors can be devastating. A team at Stanford University tried using large language models 鈥 the technology underlying popular AI tools like ChatGPT 鈥 to summarize patients鈥 medical history. They compared the results with what a physician would write.

鈥淓ven in the best case, the models had a 35% error rate,鈥 said Stanford鈥檚 Shah. In medicine, 鈥渨hen you're writing a summary and you forget one word, like 鈥榝ever鈥 鈥 I mean, that's a problem, right?鈥

Sometimes the reasons algorithms fail are fairly logical. For example, changes to underlying data can erode their effectiveness, like when hospitals switch lab providers.

Sometimes, however, the pitfalls yawn open for no apparent reason.

Sandy Aronson, a tech executive at Mass General Brigham鈥檚 personalized medicine program in Boston, said that when his team tested one application meant to help genetic counselors locate relevant literature about DNA variants, the product suffered 鈥渘ondeterminism鈥 鈥 that is, when asked the same question multiple times in a short period, it gave different results.

Aronson is excited about the potential for large language models to summarize knowledge for overburdened genetic counselors, but 鈥渢he technology needs to improve.鈥

If metrics and standards are sparse and errors can crop up for strange reasons, what are institutions to do? Invest lots of resources. At Stanford, Shah said, it took eight to 10 months and 115 man-hours just to audit two models for fairness and reliability.

Experts interviewed by 杨贵妃传媒視頻 Health News floated the idea of artificial intelligence monitoring artificial intelligence, with some (human) data whiz monitoring both. All acknowledged that would require organizations to spend even more money 鈥 a tough ask given the realities of hospital budgets and the limited supply of AI tech specialists.

鈥淚t's great to have a vision where we're melting icebergs in order to have a model monitoring their model,鈥 Shah said. 鈥淏ut is that really what I wanted? How many more people are we going to need?鈥

杨贵妃传媒視頻 Health News is a national newsroom that produces in-depth journalism about health issues and is one of the core operating programs at KFF鈥攁n independent source of health policy research, polling, and journalism. Learn more about .

Help 杨贵妃传媒視頻 Health News track this article

By including these elements when you republish, you help us:
  • Understand which communities and people we鈥檙e reaching.
  • Measure the impact of our health journalism.
  • Continue providing free, high-quality health news to the public.
Canonical Tag

Include this in your page's <head> section to properly attribute this content.

Tracking Snippet

Add this snippet at the end of your republished article to help us track its reach.