The role of artificial intelligence in providing accurate and reliable information on surgically-assisted rapid palatal expansion: A cross-sectional study

HATİPOĞLU, ŞİRİN; Ozkan, Esra; Tasova, Fatma; Zincird, ÖZGE

doi:10.1016/j.ajodo.2025.08.015

The role of artificial intelligence in providing accurate and reliable information on surgically-assisted rapid palatal expansion: A cross-sectional study

HATİPOĞLU Ş., Ozkan E. C., Tasova F. A. K., Zincird Ö.

AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, cilt.169, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 169 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.ajodo.2025.08.015
Dergi Adı: AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE
İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

Introduction: This study aimed to evaluate the accuracy, reliability, and comprehensibility of information about surgically-assisted rapid palatal expansion provided by language models based on artificial intelligence (AI). Methods: A cross-sectional content analysis was conducted on the responses to surgically-assisted rapid palatal expansion-related questions by ChatGPT-4 (OpenAI LLC, San Francisco, Calif), Gemini (Alphabet Inc, Mountain View, Calif), and Copilot (Microsoft, Redmond, Wash). In total, 115 questions (categorized into 11 domains) were created by 3 orthodontists and 1 oral and maxillofacial surgeon. The accuracy of the answers generated by the AI language models was independently evaluated by the same experts via a 5point Likert scale. To test the relationships among categorical variables, when the sample size assumption was met, the Pearson chi-square test was used. However, when the sample size assumption was not met, Fisher's exact test was applied. Analyses were performed in SPSS (version 27; IBM, Armonk, NY). Results: The responses of the AI types presented a general homogeneous distribution, with no statistically significant difference between the types of AI and the types of responses (P >0.05). Although there were no significant differences, ChatGPT-4 had the highest objectively true rate. In contrast, Gemini produced answers with more balanced accuracy, whereas Copilot had the highest number of false answers. Conclusions: These findings reveal that the accuracy of AI-supported language models in providing medical information may vary according to subject matter.