Monthly Key Publication Reviews

Publication: Kirk D, van Eijnatten E, Camps G. Comparison of Answers between ChatGPT and Human Dieticians to Common Nutrition Questions. Journal of Nutrition and Metabolism. 2023;2023:e5548684. doi:10.1155/2023/5548684

Reviewer: Bradley R. Salonen, MD, Division of General Internal Medicine, Mayo Clinic, Rochester, MN

Why is This Paper Important: The landscape of medicine is increasingly influenced by technological advancements, among which the emergence of ChatGPT, a versatile chatbot developed by OpenAI, stands out.¹ Launched in November 2022, ChatGPT rapidly gained prominence due to its ability to perform a wide range of tasks, both personal and professional, thanks to its large language model (LLM) foundation. Its user-friendly interface, coupled with the availability of a free plan, led ChatGPT to become the most rapidly adopted consumer software application in history.²

This paper is important in its investigation of ChatGPT's proficiency in responding to nutrition-related inquiries, benchmarking its performance against that of human dieticians. With an ever-growing dependency on the internet as a source of nutritional advice, assessing the dependability and accuracy of information provided by AI becomes increasingly vital. The outcomes of this study bear significant implications for the manner in which individuals seek out nutritional knowledge and for the prospective contribution of AI in enhancing public health and nutritional education initiatives.

Summary: Conducted in the Netherlands from February to June 2023, this study leveraged ChatGPT (version 3.0) to compare its efficacy in answering nutrition questions against that of human dieticians. The research involved soliciting common nutrition-related queries from registered dieticians across the Netherlands, resulting in a total of 20 questions submitted by 7 dieticians. The participating dieticians, all of whom were female with a median age of 31 and an age range of 29 to 65, practiced in various settings including private practices and medical centers.

The study focused on eight selected questions, with the remaining 12 excluded due to their specificity to medical conditions. These questions covered a spectrum of themes such as weight loss, taste, carbohydrates, general healthy nutrition, and supplementation.

The responses provided by both ChatGPT and the human dieticians were assessed by a panel of other dieticians and experts in field of the questions. The grading was based on three key criteria: scientific correctness, comprehensibility, and actionability. Each response was graded both as a whole and across these individual components. To determine statistical significance and detect any group differences in the responses, a permutation test was employed.

ChatGPT outperformed human dieticians in several aspects, scoring significantly higher in 5 out of the 8 questions overall. In terms of scientific correctness and comprehensibility, ChatGPT also scored higher in 5 out of the 8 questions. For actionability, it scored higher in 4 out of the 8 questions. Notably, human dieticians did not achieve higher overall scores for any question, nor did they lead in any of the individual grading components.

In summary, this study highlights the potential of AI tools like ChatGPT in providing accurate and understandable nutrition information, challenging the traditional boundaries of dietary consultation and suggesting new frontiers for AI application in the field of nutrition and dietetics.

Commentary: This study represents an evaluation of the evolving capabilities of AI tools in the realm of nutritional advice. With the introduction of ChatGPT and the advancement with GPT-4 LLM, we witness a significant enhancement in the ability of these AI models to handle a variety of medical-related tasks. A remarkable aspect of these modern LLM chatbots is their extensive breadth of knowledge, attributed to training on vast datasets. There is potential for even greater advancements through domain-specific fine-tuning of these models, tailoring them more closely to specialized fields like nutrition.

However, amidst the surge of enthusiasm surrounding AI, we must maintain a balanced perspective.³ Like humans, the outputs of generative models are not infallible. These systems are prone to errors and can produce 'hallucinations,' or inaccuracies, in their responses. This inherent imperfection underscores the continuing necessity for human oversight and intervention in AI applications, especially in areas like healthcare and medicine.⁴

The implications of this study extend beyond the technical achievements of AI. It prompts a broader conversation about the role of AI in supplementing human expertise, particularly in providing accessible and accurate health information to the public. While AI tools like ChatGPT demonstrate promising potential in enhancing public health education, their limitations highlight the indispensable value of human professionals in ensuring the quality and reliability of health-related advice. As AI continues to evolve, the synergy between human knowledge and artificial intelligence will be pivotal in harnessing the full benefits of technological advancements in healthcare.

References:

Li R, Kumar A, Chen JH. How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora’s Box? JAMA Intern Med. Published online April 28, 2023. doi:10.1001/jamainternmed.2023.1835
ChatGPT. In: Wikipedia. ; 2024. Accessed January 24, 2024. https://en.wikipedia.org/w/index.php?title=ChatGPT&oldid=1198641502
Wachter RM, Brynjolfsson E. Will Generative Artificial Intelligence Deliver on Its Promise in Health Care? JAMA. 2024;331(1):65-69. doi:10.1001/jama.2023.25054
Mello MM, Guha N. Understanding Liability Risk from Using Health Care Artificial Intelligence Tools. New England Journal of Medicine. 2024;390(3):271-278. doi:10.1056/NEJMhle2308901