A recent study published in Nature Medicine has revealed that OpenAI’s ChatGPT Health chatbot often underestimates the severity of medical emergencies. The research found that in over half of the cases where immediate emergency room visits would be advised by physicians, ChatGPT Health recommended delaying care. This raises significant concerns about the chatbot’s reliability in critical health situations.
The study assessed ChatGPT Health’s triage abilities by using 60 medical scenarios, comparing its responses with those of three experienced physicians. Each scenario was designed with 16 variations to ensure that demographic changes, such as race or gender, did not influence the outcomes. According to the lead author, Dr. Ashwin Ramaswamy, an instructor of urology at The Mount Sinai Hospital in New York City, this approach aimed to maintain consistency in emergency classifications.
Researchers discovered that ChatGPT Health “under-triaged” 51.6% of emergency cases. This means that instead of advising an immediate visit to the emergency room, the chatbot often suggested patients see a doctor within 24 to 48 hours. Notable emergencies included a patient suffering from diabetic ketoacidosis, a life-threatening complication, and another facing respiratory failure. Both conditions, if untreated, can lead to death. Dr. Ramaswamy emphasized that any trained medical professional would recognize the need for urgent care in these cases.
Conversely, the chatbot also “over-triaged” 64.8% of nonurgent cases, recommending unnecessary doctor appointments. For example, a patient with a three-day sore throat was told to seek medical attention, despite at-home care being a sufficient response. Dr. Ramaswamy expressed confusion regarding the inconsistent recommendations made by the chatbot.
A spokesperson for OpenAI acknowledged the importance of research into AI applications in healthcare but noted that the study’s findings do not reflect typical usage of ChatGPT Health. The spokesperson explained that the chatbot is intended for users to ask follow-up questions, providing more context, rather than receiving a single response.
ChatGPT Health is currently available to a limited number of users, as OpenAI is actively working to enhance its safety and reliability. The chatbot is designed to support users who may seek medical advice outside of regular office hours, with a significant portion of health-related inquiries coming from individuals living considerable distances from healthcare facilities.
Despite its potential benefits, experts caution against relying solely on AI for medical decisions. Dr. John Mafi, an associate professor of medicine at UCLA Health, highlighted the necessity for rigorous testing of such chatbots before they can be employed for life-affecting decisions. He noted that while AI can assist in many scenarios, it should not replace professional medical advice.
According to Dr. Ethan Goh, executive director of ARISE, an AI research network, chatbots can offer safe health advice in many instances. However, he reiterated the importance of understanding their limitations and not viewing them as substitutes for qualified healthcare providers.
Concerns have also been raised about the data used to train AI models. Dr. Monica Agrawal from Duke University pointed out that biases in the information provided by users can influence chatbot responses, potentially reinforcing misconceptions. She stated that while AI can be beneficial, it is crucial to approach its use in healthcare with caution.
Dr. Ramaswamy advised against depending on AI during emergencies, emphasizing the need for collaboration between technology and healthcare sectors to create safer and more effective AI products. He believes that improved AI capabilities could enhance the patient-doctor relationship, particularly in rural areas or regions with limited healthcare access.
As the use of AI in healthcare continues to evolve, both patients and professionals are encouraged to remain vigilant and prioritize human oversight in medical decision-making.