Explainable deep learning framework for brain tumor detection: Integrating LIME, Grad-CAM, and SHAP for enhanced accuracy


AKGÜNDOĞDU A., Çelikbaş Ş.

Medical Engineering and Physics, cilt.144, 2025 (SCI-Expanded, Scopus) identifier identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 144
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.medengphy.2025.104405
  • Dergi Adı: Medical Engineering and Physics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, BIOSIS, Biotechnology Research Abstracts, Compendex, INSPEC, MEDLINE
  • Anahtar Kelimeler: Explainable artificial intelligence, Gradient-weighted class activation mapping, Local interpretable model-agnostic explanations, Shapley additive explanations
  • İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

Deep learning approaches have improved disease diagnosis efficiency. However, AI-based decision systems lack sufficient transparency and interpretability. This study aims to enhance the explainability and training performance of deep learning models using explainable artificial intelligence (XAI) techniques for brain tumor detection. A two-stage training approach and XAI methods were implemented. The proposed convolutional neural network achieved 97.20% accuracy, 98.00% sensitivity, 96.40% specificity, and 98.90% ROC-AUC on the BRATS2019 dataset. It was analyzed with explainability techniques including Local Interpretable Model-Agnostic Explanations (LIME), Gradient-weighted Class Activation Mapping (Grad-CAM), and Shapley Additive Explanations (SHAP). The masks generated from these analyses enhanced the dataset, leading to a higher accuracy of 99.40%, 99.20% sensitivity, 99.60% specificity, 99.60% precision, and 99.90% ROC-AUC in the final stage. The integration of LIME, Grad-CAM, and SHAP showed significant success by increasing the accuracy performance of the model from 97.20% to 99.40%. Furthermore, the model was evaluated for fidelity, stability, and consistency and showed reliable and stable results. The same strategy was applied to the BR35H dataset to test the generalizability of the model, and the accuracy increased from 96.80% to 99.80% on this dataset as well, supporting the effectiveness of the method on different data sources.