BMC ORAL HEALTH, cilt.25, sa.1, 2025 (SCI-Expanded, Scopus)
Background The aim of this study was to evaluate the accuracy, reliability and comprehensibility of information about Nasoalveolar Molding (NAM) provided by artificial intelligence (AI). Methods A cross-sectional content analysis was conducted on the responses generated by ChatGPT-4 (OpenAI LLC, San Francisco, CA, USA), Gemini (Alphabet Inc., Mountain View, CA, USA) and Copilot (Microsoft Corporation, Redmond, WA, USA). In total, 11 domains and 129 questions were generated, and the answers received by the AI models were evaluated. Descriptive statistics were applied. The Pearson chi-square test was used to test the relationships between categorical variables when the sample size assumption was met, and Fisher's exact test was used when the sample size assumption was not met. Analyses were performed via the IBM SPSS 27 (IBM Corp. Armonk, NY, USA) program. Results There was no statistically significant difference between the AI types and the responses given (p > 0.05). However, a significant difference was found between AI types only in the 'Soft tissues' domain (p = 0.013), where ChatGPT-4 gave completely 'Objectively True'. When each AI type was evaluated separately, the answers of the "Knowledge/Information" domain in all the models were significantly different from those of the other domains (ChatGPT-4: p=0.003, Gemini: p=0.044, Copilot: p<0.001). ChatGPT-4 and Copilot received answers in the 'Selected Facts' category, whereas Gemini mostly received answers in the 'False' category. For the 'Function' and 'Other' domains, ChatGPT-4 mostly gave 'False' answers. Copilot produced mostly 'Objectively True' answers only for 'Satisfaction' and completely 'False' answers for the 'Microbiological/Physiological' domain. Conclusions These findings reveal that the accuracy of AI-supported language models in providing medical information may vary according to the subject matter.