Use of Machine Learning Techniques in Soil Classification


Creative Commons License

Aydin Y., IŞIKDAĞ Ü., BEKDAŞ G., NİGDELİ S. M., Geem Z. W.

SUSTAINABILITY, cilt.15, sa.3, 2023 (SCI-Expanded, SSCI, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 15 Sayı: 3
  • Basım Tarihi: 2023
  • Doi Numarası: 10.3390/su15032374
  • Dergi Adı: SUSTAINABILITY
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, Aerospace Database, CAB Abstracts, Communication Abstracts, Food Science & Technology Abstracts, Geobase, INSPEC, Metadex, Veterinary Science Database, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: soil, machine learning, classification, ensemble learning, BEARING CAPACITY, SHEAR-STRENGTH, GRAINED SOILS, PREDICTION, PARAMETERS, ALGORITHM
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

In the design of reliable structures, the soil classification process is the first step, which involves costly and time-consuming work including laboratory tests. Machine learning (ML), which has wide use in many scientific fields, can be utilized for facilitating soil classification. This study aims to provide a concrete example of the use of ML for soil classification. The dataset of the study comprises 805 soil samples based on the soil drillings of the new Gayrettepe-Istanbul Airport metro line construction. The dataset has both missing data and class imbalance. In the data preprocessing stage, first, data imputation techniques were applied to deal with the missing data. Two different imputation techniques were tested, and finally, the data were imputed with the KNN imputer. Later, a balance was achieved with the synthetic minority oversampling technique (SMOTE). After the preprocessing, a series of ML algorithms were tested with 10-fold cross-validation. Unlike the studies conducted in previous research, new gradient-boosting methods such as XGBoost, LightGBM, and CatBoost were tested, high classification accuracy rates of up to +90% were observed, and a significant improvement in the accuracy of prediction (when compared with previous research) was achieved.