Optimization Of K-Nearest Neighbor Algorithm Using Information Gain And Hyperparameter Tuning In Adult Male Fertility Classification

Authors

  • Muhammad Zaenal Muttaqin Universitas Negeri Semarang Author
  • Anggyi Trisnawan Putra Universitas Negeri Semarang Author

DOI:

https://doi.org/10.15294/rji.v4i1.14868

Keywords:

Information Gain, Machine Learning, K-Nearest Neighbor, GridSearchCV, feature optimization

Abstract

Abstract. Male fertility plays an important role in reproductive capability and global population dynamics. Male infertility can be caused by lifestyle, health conditions, and sperm quality. This research develops a male fertility classification model with an optimized K-Nearest Neighbor (KNN) algorithm using Information Gain feature selection and hyperparameter tuning with GridSearchCV. The main problems encountered are low accuracy in prediction and high computational complexity due to many irrelevant features. To overcome this, feature selection and hyperparameter optimization methods were used. The dataset used in this research comes from the UCI Machine Learning Repository, consisting of 100 data with 10 attributes. The KNN algorithm was chosen for its simplicity and ability to classify data with multiple classes and uneven distribution. However, its accuracy is highly dependent on the proper selection of features and parameters. The Information Gain method is used for selection of significant features against the target variable, reducing model complexity and computation time. Hyperparameter tuning is performed using GridSearchCV to find the best combination of parameters. The results showed that the application of Information Gain and GridSearchCV successfully improved the classification accuracy of KNN. The final model achieved 94% accuracy, better than the previous conventional method which only reached 84%. This increase in accuracy shows that KNN optimization with this approach is effective in improving male fertility classification performance. This research is expected to contribute to the development of male fertility diagnostic technology and the implementation of more accurate prediction models in clinical practice.

Purpose: The proposed model is a development based on previous research that focuses on developing the K-Nearest Neighbor algorithm with a model accuracy of 84%. this study uses feature selection techniques and hyperparameter tuning in the K-Nearest Neighbor (KNN) algorithm to improve the accuracy of the male fertility classification model.

Methods/Study design/approach: To improve the curation of the male fertility classification model and to optimize the model from previous research, this study uses the feature selection technique and hyperparameter tuning technique. For this technique, 2 types of optimization are carried out, namely feature selection using Information Gain and GridSearchCV hyperparameter tuning to get the best parameter combination for the proposed model. The fertility dataset has also been used in previous studies, used in this study.  

Result/Findings: The proposed model obtained a high accuracy of 94%, which surpassed the model in the previous study which had an accuracy of 85% for the classification of fertility levels in men. 

Novelty/Originality/Value: The novelty in this research is the addition of hyperparameter tuning techniques to optimize and obtain optimal parameters in the fertility classification model. This research also aims to improve and increase the accuracy of the previous model.

Downloads

Published

2026-03-31

Article ID

14868

Issue

Section

Articles

How to Cite

Optimization Of K-Nearest Neighbor Algorithm Using Information Gain And Hyperparameter Tuning In Adult Male Fertility Classification. (2026). Recursive Journal of Informatics, 4(1), 21-30. https://doi.org/10.15294/rji.v4i1.14868