Implementation of the K-Nearest Neighbor Algorithm (KNN) with Principal Component Analysis to Diagnose Tuberculosis

Authors

  • Yuliana Putri Universitas Negeri Semarang Author
  • Alamsyah Alamsyah Universitas Negeri Semarang Author

DOI:

https://doi.org/10.15294/rji.v3i2.5235

Keywords:

Machine Learning, Tuberculosis, Data Mining, Disease Diagnosis, KNN

Abstract

Purpose: Tuberculosis (TB) is an infectious disease that attacks the respiratory organs, the lungs, and some can attack organs outside the lungs. Indonesia is one of the largest contributors to TB cases with around 320,000 new cases every year. Delays in diagnosing TB disease can cause a higher number of deaths due to errors in the treatment of sufferers. This makes the early diagnosis of TB disease important as early as possible. The research carried out aims to implement machine learning techniques to help diagnose TB disease.

Methods: The research was carried out using the K-Nearest Neighbor (KNN) classification algorithm which was optimized with the Principal Component Analysis (PCA) feature selection technique. The dataset used consists of 577 data with 12 attributes labeled patients with tuberculosis and patients who do not have tuberculosis.

Result: From the research that has been conducted, models that implement the KNN algorithm with PCA produce models with better performance than models that only implement KNN. The model that only uses KNN gets an accuracy of 92.528%, while the model that uses KNN and PCA gets an accuracy of 98.85%. This shows that the implementation of KNN and PCA is able to produce a good tuberculosis diagnosis model and can be used to assist in the early diagnosis of tuberculosis.

Novelty: Using PCA in the feature selection process can reduce unnecessary attributes. It is a PCA that helps reduce the dimensionality, simplifies the visualization and interpretation of complex data sets. The use of PCA has been proven to be able to optimize the performance of the KNN algorithm for the detection of tuberculosis.

 

Downloads

Published

2025-10-17

Article ID

5235

How to Cite

Implementation of the K-Nearest Neighbor Algorithm (KNN) with Principal Component Analysis to Diagnose Tuberculosis. (2025). Recursive Journal of Informatics, 3(2), 108-121. https://doi.org/10.15294/rji.v3i2.5235