Impact of Feature Engineering on XGBoost Model for Forecasting Cayenne Pepper Prices

Authors

  • Jasman Pardede Institut Teknologi Nasional (Itenas) Bandung Author
  • Anisa Putri Setyaningrum Institut Teknologi Nasional (Itenas) Bandung Author
  • Muhammad Ilyas Al-Fadhlih Institut Teknologi Nasional (Itenas) Bandung Author

DOI:

https://doi.org/10.15294/sji.v12i4.32157

Keywords:

Cayenne Pepper, XGBoost, Feature Engineering, Lag Features, Forecasting

Abstract

Purpose: Cayenne pepper represents one of Indonesia’s key horticultural commodities, widely utilized in both household culinary practices and the food processing industry. Nevertheless, its market price is subject to considerable volatility, driven by factors such as weather variability, limited supply, production costs, and inefficiencies in distribution systems. This price instability generates uncertainty that adversely impacts farmers, traders, and consumers. Consequently, the development of a reliable price forecasting model is crucial to facilitate price stabilization and enable data-driven decision-making across the supply chain. This study aims to investigate the extent to which feature engineering techniques can enhance the predictive performance of the Extreme Gradient Boosting (XGBoost) algorithm in forecasting cayenne pepper prices. Through the integration of lag features, moving averages, and seasonal indicators, the proposed model is expected to more effectively capture market dynamics and provide a robust analytical tool for relevant stakeholders.

Methods: The forecasting model was constructed using the XGBoost algorithm in combination with various feature engineering methods. The dataset consists of daily price records obtained from Bank Indonesia’s PIHPS system and meteorological variables sourced from BMKG, encompassing the period between 2021 and 2024. The engineered features include lag variables identified through Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) analyses, Simple Moving Averages (SMA), seasonal indicators, and holiday-related variables designed to capture recurring patterns and event-driven price fluctuations. To enhance predictive performance, hyperparameter tuning was conducted using a grid search optimization approach.

Result: The optimal model demonstrated substantial performance improvements under the following hyperparameter configuration: alpha = 0, gamma = 0.3, lambda = 1, learning_rate = 0.05, max_depth = 3, min_child_weight = 3, n_estimators = 200, and subsample = 0.6. The application of feature engineering markedly enhanced the model’s predictive capability, increasing the R² value by 99.10% while reducing the MAE, RMSE, and MAPE by 72.63%, 71.31%, and 72.04%, respectively. These outcomes signify a notable reduction in forecasting errors and demonstrate the model’s improved accuracy.

Novelty: This study integrates multi-level price data with weather and holiday-related features, employing the ACF and the PACF analyses to determine optimal lag values (techniques commonly utilized in statistical modeling). This integration enhances both the accuracy and interpretability of the XGBoost algorithm, thereby providing a practical and effective tool for agricultural price forecasting and market planning.

Published

19-11-2025

Article ID

32157

Issue

Section

Articles

How to Cite

Impact of Feature Engineering on XGBoost Model for Forecasting Cayenne Pepper Prices. (2025). Scientific Journal of Informatics, 12(4). https://doi.org/10.15294/sji.v12i4.32157