Accuracy and Temporal Consistency of ChatGPT and Gemini in Responding to Textbook and Patient-Oriented Dental Bleaching Questions: A Multi-Session Comparative Study


ŞİŞMANOĞLU S., Kotan S. S., IŞIK V.

Journal of Esthetic and Restorative Dentistry, 2026 (SCI-Expanded, Scopus) identifier identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1111/jerd.70172
  • Dergi Adı: Journal of Esthetic and Restorative Dentistry
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE
  • Anahtar Kelimeler: artificial intelligence, chatbot accuracy, dental bleaching, patient education, temporal consistency
  • İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

Objective: This study compared the accuracy and temporal consistency of ChatGPT and Gemini in responding to dental bleaching questions across three weekly sessions. Materials and Methods: A total of 280 true/false questions were developed comprising 200 textbook-based and 80 patient-oriented frequently asked questions. Both chatbots were queried weekly under controlled conditions. Accuracy was compared using generalized estimating equations, consistency was assessed using Fleiss' kappa, and weekly stability was evaluated using Cochran's Q test. Open-ended responses were scored for quality and misinformation by two evaluators. Results: For textbook questions, ChatGPT achieved significantly higher accuracy than Gemini (77.7% versus 70.5%, p = 0.0009). For frequently asked questions, both chatbots performed comparably (92.9% versus 90.8%, p = 0.252). Temporal consistency was only fair for textbook questions but almost perfect for frequently asked questions in both chatbots. Both chatbots showed significant upward trends in textbook accuracy across sessions. Gemini received higher global quality scores for open-ended responses, while misinformation rates were similarly low. Conclusions: Within the limitations of this study, ChatGPT achieved significantly higher accuracy than Gemini for textbook-based dental bleaching questions, while both chatbots performed comparably for patient-oriented questions. Temporal consistency differed markedly, with almost perfect consistency for patient-oriented questions and only fair consistency for textbook-based questions. Clinical Significance: Chatbot responses to common patient questions about dental bleaching are generally accurate and consistent, but their reliability drops substantially for specialized academic content, suggesting these tools should complement rather than replace professional clinical judgment.