Performance Evaluation of Machine Learning Models for Soil Fertility Classification Based on the Indian Soil Fertility Dataset
DOI:
https://doi.org/10.15294/edukom.v12i1.10317Keywords:
Classification , Ensemble Classifier, Machine Learning, Single Classifier, Soil Fertility ClassificationAbstract
Rice farming productivity worldwide has been declining due to improper soil management practices, including excessive chemical fertilizer use and irregular irrigation. The main challenge lies in accurately classifying soil fertility levels to support optimal land use and reduce resource waste, especially when dealing with imbalanced datasets. This study aims to compare the performance of single classifiers and ensemble classifiers in classifying soil fertility. The single classifiers used include K-Nearest Neighbor (KNN), Naive Bayes, Decision Tree, Support Vector Machine (SVM), and Artificial Neural Network (ANN), while the ensemble classifiers include Random Forest and XGBoost. The Indian Soil Fertility Dataset, obtained from Kaggle, contains 880 samples with 12 features and 1 output class. The research methodology involved data acquisition, preprocessing, data splitting, standardization, and classification, with performance evaluation conducted using a confusion matrix. The results show that ensemble classifiers, particularly Random Forest and XGBoost, outperform single classifiers in imbalanced datasets, achieving accuracy, precision, recall, and F1-score values exceeding 92%-95% across all split scenarios. The findings conclude that Random Forest and XGBoost can serve as reliable models for assisting farmers and agricultural experts in evaluating soil conditions, minimizing unnecessary fertilizer usage, and improving rice farming productivity globally.
References
Akula, B., Reddy, Dr. K. I., N, D., & RS, P. (2023). Advances in soil fertility classification: Data mining approach. International Journal of Statistics and Applied Mathematics, 8(5S), 475–481. https://doi.org/10.22271/maths.2023.v8.i5sg.1240
Blesslin Sheeba, T., Anand, L. D. V., Manohar, G., Selvan, S., Wilfred, C. B., Muthukumar, K., Padmavathy, S., Ramesh Kumar, P., & Asfaw, B. T. (2022). Machine Learning Algorithm for Soil Analysis and Classification of Micronutrients in IoT-Enabled Automated Farms. Journal of Nanomaterials, 2022(1), 5343965. https://doi.org/https://doi.org/10.1155/2022/5343965
Bouasria, A., Bouslihim, Y., Gupta, S., Taghizadeh-Mehrjardi, R., & Hengl, T. (2023). Predictive performance of machine learning model with varying sampling designs, sample sizes, and spatial extents. Ecological Informatics, 78. https://doi.org/10.1016/j.ecoinf.2023.102294
Bouslihim, Y., John, K., Miftah, A., Azmi, R., Aboutayeb, R., Bouasria, A., Razouk, R., & Hssaini, L. (2024). The effect of covariates on Soil Organic Matter and pH variability: a digital soil mapping approach using random forest model. Annals of GIS, 30(2), 215–232. https://doi.org/10.1080/19475683.2024.2309868
Fidiyanto, N., & Izzati, A. N. (2024). Penerapan Data Mining Klasifikasi Lahan Tanam Buah Alpukat dengan Algoritma Naïve Bayes. BIOS : Jurnal Teknologi Informasi Dan Rekayasa Komputer, 5(2), 95–103. https://doi.org/10.37148/bios.v5i2.125
Hanif, N. A., Hannats, M., Ichsan, H., & Budi, A. S. (2022). Rancangan Sistem Klasifikasi Kesuburan Tanah pada Tanaman Pangan berdasarkan PH dan Kelembapan berbasis Arduino Nano menggunakan Metode K-NN dan Aplikasi Android (Vol. 6, Issue 8). http://j-ptiik.ub.ac.id
Jaiswal, R. (2024, August 17). Soil Fertility Dataset. Kaggle: Https://Www.Kaggle.Com/Datasets/Rahuljaiswalonkaggle/Soil-Fertility-Dataset.
Mallah, S., Delsouz Khaki, B., Davatgar, N., Scholten, T., Amirian-Chakan, A., Emadi, M., Kerry, R., Mosavi, A. H., & Taghizadeh-Mehrjardi, R. (2022). Predicting Soil Textural Classes Using Random Forest Models: Learning from Imbalanced Dataset. Agronomy, 12(11). https://doi.org/10.3390/agronomy12112613
Mukhtar, H., Maulina Syafutri, T., Aulia Rahman, R., Putra, A., Hafsari, R., Ilmu Komputer, F., & Muhammadiyah Riau, U. (2024). Analisis Kesuburan Pertanian Melalui Irigasi Dengan Menggunakan Metode K-Means Clustering (Vol. 4, Issue 2).
Pradana, M. R., Hannats, M., Ichsan, H., & Akbar, S. R. (2023). Klasifikasi Kesuburan dan Daya Ukur Cakupan Kelembaban Tanah pada Tanaman Jambu Merah berbasis Arduino (Vol. 7, Issue 4). http://j-ptiik.ub.ac.id
Pramoedyo, H., Ariyanto, D., & Aini, N. N. (2022). Comparison of Random Forest and Naive Bayes Methods for Classifying and Forecasting Soil Texture in The Area Around DAS Kalikonto East Java. Barekeng: Journal of Mathematics and Its Application, 16(4), 1411–1422. https://doi.org/10.30598/barekengvol16iss4pp1411-1422
Reddy, B. B., Maragatham, S., Santhi, R., Balachandar, D., Vijayalakshmi, D., Davamani, V., Vasu, D., & Gopalakrishnan, M. (2024). Predictive soil mapping using random forest models: Applications in pH and soil organic matter assessment. Plant Science Today, 11(4), 463–474. https://doi.org/10.14719/pst.3865
Sarangi, A., Raula, S. K., Ghoshal, S., Kumar, S., Kumar, C. S., & Padhy, N. (2024). Enhancing Process Control in Agriculture: Leveraging Machine Learning for Soil Fertility Assessment †. Engineering Proceedings, 67(1). https://doi.org/10.3390/engproc2024067031
Siahaan, A., Hannats, M., Ichsan, H., & Fitriyah, H. (2023). Implementasi Fuzzy K-Nearest Neighbor dalam Sistem Klasifikasi Kualitas Tanah pada Tanaman Kedelai berdasarkan Kelembapan dan pH Tanah menggunakan Arduino (Vol. 7, Issue 5). http://j-ptiik.ub.ac.id
Supriyanto, S., & Atwa Magriyanti, A. (2022). Perancangan Sistem Monitoring Kualitas Tanah Sawah Dengan Parameter Suhu Dan Kelembaban Tanah Menggunakan Arduino Berbasis Internet Of Things (IoT). JURNAL ILMIAH ELEKTRONIKA DAN KOMPUTER, 15(2), 234–241. http://journal.stekom.ac.id/index.php/elkompage234
Wadoux, A. M. J.-C., Samuel-Rosa, A., Poggio, L., & Mulder, V. L. (2020). A note on knowledge discovery and machine learning in digital soil mapping. European Journal of Soil Science, 71(2), 133–136. https://doi.org/https://doi.org/10.1111/ejss.12909



