Işık V., Şişmanoğlu S., Baybora Kayahan M.
Essentials of Dentistry, cilt.5, ss.1-7, 2026 (TRDizin)
-
Yayın Türü:
Makale / Tam Makale
-
Cilt numarası:
5
-
Basım Tarihi:
2026
-
Doi Numarası:
10.5152/essentdent.2026.0113
-
Dergi Adı:
Essentials of Dentistry
-
Derginin Tarandığı İndeksler:
TR DİZİN (ULAKBİM)
-
Sayfa Sayıları:
ss.1-7
-
İstanbul Üniversitesi-Cerrahpaşa Adresli:
Evet
Özet
Background: This study evaluated the accuracy and consistency of responses generated by 2 large language models (LLMs), ChatGPT and Gemini, regarding the management of deep carious lesions and pulp exposure in endodontics.Methods: Fifty dichotomous (Yes/No) questions were developed based on the position statement of the European Society of Endodontology and distributed across 5 categories: Diagnosis and Classification, Caries Management, Pulp Exposure Management, Materials and Techniques, and Follow-up and Prognosis. Questions were presented to ChatGPT-4o and Gemini (Flash 2.5) on 3 occasions, 1 week apart. A total of 300 responses were collected and compared with reference answers. Accuracy was measured as the proportion of correct responses, while consistency was assessed using Fleiss’ Kappa across time points. Statistical analyses included Cochran’s Q and McNemar’s test, with significance set at P < .05.Results: ChatGPT achieved an overall accuracy of 76.7%, while Gemini achieved 86.0%, a statistically significant difference favoring Gemini (P=.034). By categories, Gemini showed superior accuracy in Caries Management and Pulp Exposure Management (96.7%), while ChatGPT performed best in Diagnosis and Classification (93.3%). Substantial consistency was observed for both models (Fleiss’ Kappa=0.627 for ChatGPT; 0.723 for Gemini). Gemini’s accuracy varied significantly across weeks (P=.015), whereas ChatGPT’s remained stable (P=.670).Conclusion: Both LLM-based chatbots demonstrated moderate accuracy and high consistency in endodontics. While results highlight their potential as educational and decision-support tools, current performance remains insufficient for reliable clinical application. Domain-specific training and further refinement are necessary before widespread implementation in endodontic practice.Cite this article as: Işık V, Sismanoglu S, Kayahan MB. Can large language models support endodontic decision-making? Accuracy and consistency of ChatGPT and Gemini in deep caries and pulp exposure. Essent Dent. 2026, 5, 0113, doi:10.5152/EssentDent.2026.25113.