Evaluation of Large Language Model-Based Chatbots for Dental Trauma Management: A Comparative Study Based on Accuracy, Consistency and Information Quality

IŞIK, VASFİYE; Sengul, Rana; ŞİŞMANOĞLU, SONER

doi:10.1111/aej.70055

Evaluation of Large Language Model-Based Chatbots for Dental Trauma Management: A Comparative Study Based on Accuracy, Consistency and Information Quality

IŞIK V., Sengul R. I., ŞİŞMANOĞLU S.

Australian Endodontic Journal, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1111/aej.70055
Dergi Adı: Australian Endodontic Journal
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
Anahtar Kelimeler: artificial intelligence, chatbots, dental trauma, large language models, traumatic dental injuries
İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

This study evaluated and compared the accuracy, consistency, readability, and information quality of three LLM-based chatbots, namely ChatGPT-5, Claude AI (Sonnet 4.0), and Perplexity (Mistral Large 2), in addressing traumatic dental injury questions. Forty true/false statements were submitted to each chatbot three times at weekly intervals to assess accuracy and consistency. Additionally, chatbot responses for 25 open-ended case-based questions were evaluated for readability, understandability and actionability, information reliability and quality. For true/false questions, Perplexity showed the highest accuracy, followed by Claude and ChatGPT. For open-ended responses, ChatGPT excelled in readability (FRE: 62.4 ± 7.6), Perplexity in understandability (91.0 ± 4.3) and actionability (93.0 ± 6.4) and Claude in information reliability (mDISCERN total: 61.2; no variability observed). All chatbots achieved acceptable global quality scores (> 4.4). These findings emphasise the complementary role of chatbots in dental trauma management. Tool selection should be based on intended use, while continued human oversight remains essential in clinical decision-making.