Artificial intelligence-powered chatbots are providing problematic medical guidance in roughly 50% of cases, according to a new study, raising concerns about the risks of relying on such tools for health information.
Researchers from the United States, Canada and the United Kingdom assessed five widely used platforms—ChatGPT, Gemini, Meta AI, Grok and DeepSeek—by posing 10 medical questions across five different health categories.
Their findings, published in the medical journal BMJ Open, showed that around half of all responses were considered problematic, including nearly 20% classified as highly problematic.
The study found that performance varied depending on the type of question. Chatbots performed better when responding to closed-ended queries and topics such as vaccines and cancer. However, they were less accurate when handling open-ended questions and more complex areas like stem cell research and nutrition.
Researchers also noted that responses were often delivered with confidence, despite lacking clinical accuracy or proper referencing. None of the chatbots produced a fully complete and accurate reference list for any of the prompts tested.
Only two refusals to respond were recorded during the study, both from Meta AI, according to the researchers.
The findings underscore growing concerns about the use of generative AI tools in healthcare-related queries, particularly as these systems are not medically certified and lack the clinical expertise required for diagnosis or treatment advice.

