Typos and slang spur AI to discourage seeking medical care

Technology

Typos and slang spur AI to discourage seeking medical care

AI models change their medical recommendations when people ask them questions that include colourful language, typos, odd formatting and even gender-neutral pronouns

By Jeremy Hsu

30 June 2025

Be cautious about asking AI for advice on when to see a doctor
Chong Kee Siong/Getty Images

Should you see a doctor about your sore throat? AI’s advice may depend on how carefully you typed your question. When artificial intelligence models were tested on simulated writing from would-be patients, they were more likely to advise against seeking medical care if the writer made typos, included emotional or uncertain language – or was female.

AI doesn't know 'no' – and that's a huge problem for medical bots

“Insidious bias can shift the tenor and content of AI advice, and that can lead to subtle but important differences” in how medical resources are distributed, says at the University of California, San Diego, who was not involved in the study.

at the Massachusetts Institute of Technology and her colleagues used AI to help create thousands of patient notes in different formats and styles. For example, some messages included extra spaces and typos to mimic patients with limited English proficiency or less ease with typing. Other notes used uncertain language in the style of writers with health anxiety, colourful expressions that lent a dramatic or emotional tone or gender-neutral pronouns.

The researchers then fed the notes to four large language models (LLMs) commonly used to power chatbots and told the AI to answer questions about whether the patient should manage their condition at home or visit a clinic, and whether the patient should receive certain lab tests and other medical resources. These AI models included OpenAI’s GPT-4, Meta’s Llama-3-70b and Llama-3-8b, and the Palmyra-Med model developed for the healthcare industry by the AI company Writer.

The tests showed that the various format and style changes made all the AI models between 7 and 9 per cent more likely to recommend patients stay home instead of getting medical attention. The models were also more likely to recommend that female patients remain at home, and follow-up showed they were more likely than human clinicians to change their recommendations for treatments because of gender and language style in the messages.

Free newsletter

Sign up to The Weekly

The best of 91av, including long-reads, culture, podcasts and news, each week.

OpenAI and Meta did not respond to a request for comment. Writer does not “recommend or support” using LLMs – including the company’s Palmyra-Med model – for clinical decisions or health advice “without a human in the loop”, says at Writer.

Most operational AI tools currently used in electronic health record systems rely on OpenAI’s GPT-4o, which was not specifically studied in this research, says Singh. But he said one big takeaway from the study is the need for improved ways to “evaluate and monitor generative AI models” used in the healthcare industry.

Journal reference:

FAccT ’25: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with 91av events and special offers.

91av

Technology