Which Features Matter Most? Evaluating Numerical and Textual Features for Helpfulness Classification in Imbalance Dataset using XGBoost
DOI:
https://doi.org/10.15294/sji.v12i4.33443Keywords:
Review Helpfulness, Helpful Vote, Time-based Evaluation, Imbalanced Data HandlingAbstract
Purpose: This study aims to develop and realistically evaluate a reliable model for identifying helpful online reviews, particularly in the context of Indonesian-language texts, which are often informal and challenging.
Methods: This study addresses several key challenges in predicting review helpfulness: the relative effectiveness of numerical features from metadata compared with traditional text representations (TF-IDF, FastText) on noisy data; the impact of severe class imbalance; and the limitations of standard validation compared with time-based validation. To address these challenges, we built an XGBoost model and evaluated various feature combinations. A hybrid approach combining SMOTE and scale_pos_weight was applied to handle class imbalance, and the best configuration was further assessed using time-based validation to better simulate real-world conditions.
Result: The results show that the model based on numerical features consistently outperformed the text-based model, achieving a peak macro F1-score of 0.7214. Compared to the IndoBERT baseline (F1-score = 0.6400) and the RCNN FastText baseline (F1-score = 0.5317), this indicates that simpler feature-driven models can provide more reliable predictions under noisy review data. Time-based validation further revealed a performance decline of up to 8.06%, confirming the presence of concept drift and highlighting that standard validation tends to yield overly optimistic estimates.
Novelty: The main contribution of this research lies in offering a robust methodology while demonstrating the superiority of metadata-based approaches in this context. By quantifying performance degradation through temporal validation, this study provides a more realistic benchmark for real-world applications and highlights the critical importance of regular model retraining.
