Penerapan Stacking Ensemble Learning untuk Klasifikasi Efek Kesehatan Akibat Pencemaran Udara
Abstract
Pencemaran udara merupakan masalah serius yang berdampak negatif pada kesehatan manusia. Berbagai jenis polutan udara seperti partikel halus, sulfur dioksida, nitrogen oksida, dan ozon dapat menyebabkan gangguan pernapasan, penyakit jantung, kanker paru-paru, dan masalah kesehatan lainnya. Untuk memahami dampak kesehatan pencemaran udara, klasifikasi efek kesehatan akibat pencemaran udara menjadi penting. Metode klasifikasi ini membagi efek kesehatan berdasarkan jenis polutan, dosis, dan waktu paparan. Penelitian ini mengusulkan penerapan metode klasifikasi dengan ensemble learning untuk mengidentifikasi polutan berdampak dan tingkat risiko kesehatannya. Ensemble learning adalah teknik pembelajaran mesin yang menggabungkan beberapa model untuk meningkatkan akurasi prediksi. Stacking ensemble learning merupakan salah satu metode yang digunakan dalam klasifikasi efek kesehatan pencemaran udara dengan mengintegrasikan beberapa model dasar seperti Logistic Regression, Decision Tree, K-Nearest Neighbor, Support Vector Machine, dan Multi-Layer Perceptron. Hasil penelitian menunjukkan bahwa model Stacking memberikan performa tertinggi dengan akurasi sekitar 99,9% pada dataset baik yang seimbang maupun tidak seimbang. Namun, model Decision Tree dan K-Nearest Neighbor juga berhasil memberikan performa yang sangat baik. Waktu pelatihan model menjadi pertimbangan penting, di mana K-Nearest Neighbor dan Decision Tree memiliki waktu yang jauh lebih singkat dibandingkan dengan model Stacking.
References
Alves Ribeiro, V. H., Moritz, S., Rehbach, F., & Reynoso-Meza, G. (2020). A novel dynamic multi-criteria ensemble selection mechanism applied to drinking water quality anomaly detection. Science of The Total Environment, 749, 142368. https://doi.org/10.1016/j.scitotenv.2020.142368
Amira, S. A., Utama, S., & Fahmi, M. H. (2020). Penerapan Metode Support Vector Machine untuk Analisis Sentimen pada Review Pelanggan Hotel. Edu Komputika Journal, 7(2), 40–48. https://doi.org/10.15294/edukomputika.v7i2.42608
CH4_CO_CO2_Health Effects | Kaggle. (n.d.). Retrieved March 16, 2023, from https://www.kaggle.com/datasets/airpollutionhealth/ch4-co-co2-health-effects
Cui, L., & Wang, S. (2021). Mapping the daily nitrous acid (HONO) concentrations across China during 2006–2017 through ensemble machine-learning algorithm. Science of The Total Environment, 785, 147325. https://doi.org/10.1016/j.scitotenv.2021.147325
Du, Z., Heng, J., Niu, M., & Sun, S. (2021). An innovative ensemble learning air pollution early-warning system for China based on incremental extreme learning machine. Atmospheric Pollution Research, 12(9), 101153. https://doi.org/10.1016/j.apr.2021.101153
García, S., Zhang, Z.-L., Altalhi, A., Alshomrani, S., & Herrera, F. (2018). Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences, 445–446, 22–37. https://doi.org/10.1016/j.ins.2018.03.002
Gladkova, E., & Saychenko, L. (2022). Applying machine learning techniques in air quality prediction. Transportation Research Procedia, 63, 1999–2006. https://doi.org/10.1016/j.trpro.2022.06.222
Gokul, P. R., Mathew, A., Bhosale, A., & Nair, A. T. (2023). Spatio-temporal air quality analysis and PM2.5 prediction over Hyderabad City, India using artificial intelligence techniques. Ecological Informatics, 76, 102067. https://doi.org/10.1016/j.ecoinf.2023.102067
Hadj Sassi, M. S., & Chaari Fourati, L. (2022). Comprehensive survey on air quality monitoring systems based on emerging computing and communication technologies. Computer Networks, 209, 108904. https://doi.org/10.1016/j.comnet.2022.108904
Hassan Bhat, T., Jiawen, G., & Farzaneh, H. (2021). Air Pollution Health Risk Assessment (AP-HRA), Principles and Applications. International Journal of Environmental Research and Public Health, 18(4), 1935. https://doi.org/10.3390/ijerph18041935
Hulkkonen, M., Lipponen, A., Mielonen, T., Kokkola, H., & Prisle, N. L. (2022). Changes in urban air pollution after a shift in anthropogenic activity analysed with ensemble learning, competitive learning and unsupervised clustering. Atmospheric Pollution Research, 13(5), 101393. https://doi.org/10.1016/j.apr.2022.101393
Ke, H., Gong, S., He, J., Zhang, L., Cui, B., Wang, Y., Mo, J., Zhou, Y., & Zhang, H. (2022). Development and application of an automated air quality forecasting system based on machine learning. Science of The Total Environment, 806, 151204. https://doi.org/10.1016/j.scitotenv.2021.151204
Khojasteh, D. N., Goudarzi, G., Taghizadeh-Mehrjardi, R., Asumadu-Sakyi, A. B., & Fehresti-Sani, M. (2021). Long-term effects of outdoor air pollution on mortality and morbidity–prediction using nonlinear autoregressive and artificial neural networks models. Atmospheric Pollution Research, 12(2), 46–56. https://doi.org/10.1016/j.apr.2020.10.007
Kumar, K., & Pande, B. P. (2022). Air pollution prediction with machine learning: A case study of Indian cities. International Journal of Environmental Science and Technology. https://doi.org/10.1007/s13762-022-04241-5
Lei, T. M. T., Siu, S. W. I., Monjardino, J., Mendes, L., & Ferreira, F. (2022). Using Machine Learning Methods to Forecast Air Quality: A Case Study in Macao. Atmosphere, 13(9), 1412. https://doi.org/10.3390/atmos13091412
Li, Y., Sha, Z., Tang, A., Goulding, K., & Liu, X. (2023). The application of machine learning to air pollution research: A bibliometric analysis. Ecotoxicology and Environmental Safety, 257, 114911. https://doi.org/10.1016/j.ecoenv.2023.114911
Lin, C.-Y., Chang, Y.-S., & Abimannan, S. (2021). Ensemble multifeatured deep learning models for air quality forecasting. Atmospheric Pollution Research, 12(5), 101045. https://doi.org/10.1016/j.apr.2021.03.008
Liu, S. M., Chen, J.-H., & Liu, Z. (2023). An empirical study of dynamic selection and random under-sampling for the class imbalance problem. Expert Systems with Applications, 221, 119703. https://doi.org/10.1016/j.eswa.2023.119703
Magnolia, C., Nurhopipah, A., & Kusuma, B. A. (2023). Penanganan Imbalanced Dataset untuk Klasifikasi Komentar Program Kampus Merdeka Pada Aplikasi Twitter. Edu Komputika Journal, 9(2), 105–113. https://doi.org/10.15294/edukomputika.v9i2.61854
Maleki, H., Sorooshian, A., Goudarzi, G., Baboli, Z., Tahmasebi Birgani, Y., & Rahmati, M. (2019). Air pollution prediction by using an artificial neural network model. Clean Technologies and Environmental Policy, 21(6), 1341–1352. https://doi.org/10.1007/s10098-019-01709-w
Marini, R. P., Lavely, E. K., Baugher, T. A., Crassweller, R., & Schupp, J. R. (2022). Using Logistic Regression to Predict the Probability That Individual ‘Honeycrisp’ Apples Will Develop Bitter Pit. HortScience, 57(3), 391–399. https://doi.org/10.21273/HORTSCI16081-21
Masmoudi, S., Elghazel, H., Taieb, D., Yazar, O., & Kallel, A. (2020). A machine-learning framework for predicting multiple air pollutants’ concentrations via multi-target regression and feature selection. Science of The Total Environment, 715, 136991. https://doi.org/10.1016/j.scitotenv.2020.136991
Méndez, M., Merayo, M. G., & Núñez, M. (2023). Machine learning algorithms to forecast air quality: A survey. Artificial Intelligence Review. https://doi.org/10.1007/s10462-023-10424-4
Mercol, J. P., Gambini, J., & Santos, J. M. (2008). Automatic classification of oranges using image processing and data mining techniques. XIV Congreso Argentino de Ciencias de La Computación. XIV Argentine Congress of Computer Sciences (CACIC 2008), 1–12.
Murad, M., Sukmawaty, S., Ansar, A., Sabani, R., & Hidayat, S. (2021). Sistem Pendeteksi Kerusakan Buah Mangga Menggunakan Sensor Gas Dengan Metode DCS - LCA. JTIM : Jurnal Teknologi Informasi dan Multimedia, 3(4), 186–194. https://doi.org/10.35746/jtim.v3i4.169
Rismayati, R., Ismarmiaty, I., & Hidayat, S. (2022). Ensemble Implementation for Predicting Student Graduation with Classification Algorithm. International Journal of Engineering and Computer Science Applications (IJECSA), 1(1), 35–42. https://doi.org/10.30812/ijecsa.v1i1.1805
Samek, L. (2016). Overall human mortalityand morbidity due to exposureto air pollution. International Journal of Occupational Medicine and Environmental Health, 29(3), 417–426. https://doi.org/10.13075/ijomeh.1896.00560
Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0
Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023
Trends in atmospheric concentrations of CO2 (ppm), CH4 (ppb) and N2O (ppb), between 1800 and 2017—European Environment Agency. (n.d.). [Data Visualization]. Retrieved April 18, 2023, from https://www.eea.europa.eu/data-and-maps/daviz/atmospheric-concentration-of-carbon-dioxide-5#tab-chart_5_filters=%7B%22rowFilters%22%3A%7B%7D%3B%22columnFilters%22%3A%7B%22pre_config_polutant%22%3A%5B%22CH4%20(ppb)%22%5D%7D%7D
Types of pollutants. (n.d.). Retrieved April 18, 2023, from https://www.who.int/teams/environment-climate-change-and-health/air-quality-and-health/health-impacts/types-of-pollutants
Worasawate, D., Sakunasinha, P., & Chiangga, S. (2022). Automatic Classification of the Ripeness Stage of Mango Fruit Using a Machine Learning Approach. AgriEngineering, 4(1), 32–47. https://doi.org/10.3390/agriengineering4010003
Wu, J., Shen, J., Xu, M., & Shao, M. (2021). A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count. Computer Methods and Programs in Biomedicine, 211, 106444. https://doi.org/10.1016/j.cmpb.2021.106444
Zhang, Y., Liu, J., & Shen, W. (2022). A Review of Ensemble Learning Algorithms Used in Remote Sensing Applications. Applied Sciences, 12(17), 8654. https://doi.org/10.3390/app12178654