Use of multivariate analysis for Improving Feature Selection on Air Pollutants Dataset

Birdal R. G.

International Conference on Recent Academic Studies ICRAS 2023, Konya, Türkiye, 2 - 04 Mayıs 2023, cilt.1, sa.29802075, ss.85, (Özet Bildiri)

Yayın Türü: Bildiri / Özet Bildiri
Cilt numarası: 1
Basıldığı Şehir: Konya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.85
İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

In this study, air quality measurements were obtained from Istanbul Metropolitan Municipality Environment Protection and Control Office, a municipal agency with ten automatic air quality gauge stations that measure air pollution in Istanbul's atmosphere. An 1-hour interval has been used to observe these measurements. The variables utilized in the study are Ozone (μg/m3), Sulfur Dioxide (μg/m3), Nitric Oxide(μg/m3), Nitrogen Dioxide (μg/m3), Dust (μg/m3), Total Hydrocarbon (μg/m3), Outdoor Temperature (°𝐶), Wind Speed (m/s), Solar Irradiance (Hour), Cloudiness(0–10), Pressure (mbar), Relative Humidity (%), Rain (mm). In order to capture multivariate correlations between these features of linear values, multivariate groups are formed by combining these features of linear values. This multivariate analysis reveals the importance of variables to capture high-quality multi-features in the predictionn. The analysis shows when 𝑂3, THC, NO features are in the multivariate groups, these features are the best to identify air pollution quality with .7658 accuracy rate. OT, SI, 𝑆𝑂2 comes after with ,7442 accuracy rate. This study shows that by selecting only the most important features, we can reduce the complexity of the model and improve its accuracy and generalization ability. Feature ranking can help us understand the relationships between variables in a dataset and identify patterns or trends that may be important for understanding the data. Feature ranking can also improve the interpretability of machine learning models. Unlike previous studies, multi-feature comparison as a group instead of individual feature comparison contributed to the reduction in data size. There was also a 17% improvement in calculation times which revealed the importance of the study.