Prediction of hydrogen and methane yields from gasification of leather waste using machine learning and explainable AI: An original dataset

Cihan, Pınar; Alfarra, Fatma; Kurtulus Ozcan, HÜSEYİN; CİNER, MİRAÇ; ÖNGEN, ATAKAN

doi:10.1016/j.jenvman.2025.126521

Prediction of hydrogen and methane yields from gasification of leather waste using machine learning and explainable AI: An original dataset

Cihan P., Alfarra F., Kurtulus Ozcan H. K., CİNER M. N., ÖNGEN A.

JOURNAL OF ENVIRONMENTAL MANAGEMENT, cilt.391, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 391
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.jenvman.2025.126521
Dergi Adı: JOURNAL OF ENVIRONMENTAL MANAGEMENT
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, International Bibliography of Social Sciences, PASCAL, Aerospace Database, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, CAB Abstracts, Communication Abstracts, Environment Index, Geobase, Greenfile, Index Islamicus, Metadex, Pollution Abstracts, Public Affairs Index, Veterinary Science Database, Civil Engineering Abstracts
Anahtar Kelimeler: Data generation, Explainable AI, Machine learning, SHAP, Syngas prediction
İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

Accurately predicting syngas composition is essential for optimizing energy production and ensuring environmental sustainability. Despite the growing use of machine learning techniques in this field, publicly available datasets remain limited, and existing datasets contain relatively few samples. To bridge this gap, we generated a comprehensive dataset of 3748 samples under controlled laboratory conditions and publicly shared it on Kaggle (https://www.kaggle.com/datasets/miracnurciner/gasification-dataset). This study aims to identify the most successful machine learning model for predicting H-2 and CH4 gas concentrations by evaluating nine models: Random Forest (RF), Linear Regression (LR), Decision Tree (DT), Support Vector Regression (Linear and RBF), K-Nearest Neighbors (KNN), Gradient Boosting Regressor (GBR), XGBoost, CatBoost, and LightGBM. Model performance was assessed using multiple metrics, including the coefficient of determination (R-2), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and explained variance score (EVS). The Friedman test was applied to evaluate the statistical significance of performance differences among the models. The results show that the KNN model achieved the highest predictive performance for both H-2 (R-2 = 0.987, RMSE = 1.253) and CH4 (R-2 = 0.979, RMSE = 0.920). Friedman test shows that the performance differences between the models are statistically significant (p < 0.001). By integrating Shapley Additive Explanations (SHAP) into the model, the contribution of each feature to the prediction results is clarified. SHAP analysis highlights that temperature and time are the main features affecting H-2 and CH4 gas. This study highlights the potential of machine learning techniques for biomass gas prediction and advocates for integrating Explainable AI (XAI) methods, establishing a robust foundation for future research. Furthermore, by providing a large, publicly available dataset, this research significantly advances studies in syngas composition prediction.