Hybrid Feature Selection for Effective Heart Disease Detection: A Multi-Algorithm Machine Learning Approach
DOI:
https://doi.org/10.15294/sji.v13i1.38815Keywords:
Heart disease, Hybrid feature selection, SMOTEENN, Machine learning, Random forestAbstract
Purpose: This research aims to develop an effective early detection model for heart disease with data balancing and hybrid feature selection. The study seeks to enhance predictive accuracy and minimize errors, providing a robust model for clinical decision support systems.
Methods: The study used the Heart Failure Prediction dataset derived from Kaggle. A novel hybrid framework was implemented, integrating SMOTEENN (Synthetic Minority Over-sampling Technique + Edited Nearest Neighbors) for data balancing and a Hybrid Feature Selection (HFS) method combining Chi-square and Backward Elimination. Eight machine learning algorithms, including Logistic Regression, Naïve Bayes, Decision Tree, K Nearest Neighbor, Random Forest, Gradient Boosting, Support Vector Machine, and XGBoost. Performance was assessed based on accuracy, precision, recall, f1-score, specificity, AUC Score, fallout and miss rate.
Result: The proposed framework significantly improved classification performance across all algorithms. The Random Forest model emerged as the optimal classifier, achieving an accuracy of 99.44%, AUC Score of 99.98%, and a specific reduction in miss rate to 0.92% (from 10.03% baseline). The HFS method successfully reduced the feature space by 54%, identifying 'ExerciseAngina', 'FastingBS', 'ST_Slope', 'ChestPainType', and 'Sex' as the most critical predictors. The model outperformed standard approaches and recent state-of-the-art benchmarks by over 10% in accuracy.
Novelty: This study introduces a synergistic integration of SMOTEENN with hybrid feature selection. The combination significantly improves model performance in early heart disease detection.
