Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch

Wahyu  Hardyanto; Nila Prasetya Aryani; Defin  Andestian; Sugiyanto; Wahyu  Setyaningrum; M Fadil  Mardiansyah; Muhamad Anbiya Nur  Islam; Aji  Purwinarko

doi:10.15294/sji.v12i4.35918

Authors

Wahyu Hardyanto Universitas Negeri Semarang Author
Nila Prasetya Aryani Universitas Negeri Semarang Author
Defin Andestian Universitas Negeri Semarang Author
Sugiyanto Universitas Negeri Semarang Author
Wahyu Setyaningrum Universitas Negeri Semarang Author
M Fadil Mardiansyah Universitas Negeri Semarang Author
Muhamad Anbiya Nur Islam Universitas Negeri Semarang Author
Aji Purwinarko Universitas Negeri Semarang Author

DOI:

https://doi.org/10.15294/sji.v12i4.35918

Keywords:

Sentiment analysis, BiLSTM, Word2Vec, RoBERTa

Abstract

Purpose: We improve the accuracy of Twitter/X sentiment analysis with a hybrid model combining Word2Vec and the Robustly Optimized BERT Pretraining Approach (RoBERTa). However, Twitter/X text is noisy (slang/OOV) and ambiguous, so the performance of the pre-trained transformer decreases. Word2Vec is also limited to local contexts. Integrative studies of both are still limited. The idea is that Word2Vec is strong for slang/novel vocabulary (distributional semantics), while RoBERTa excels in contextual meaning; combining the two mitigates each other's weaknesses.

Methods: The Sentiment140 dataset contains 1.6 million balanced tweets. The split is stratified; Word2Vec is trained solely on the training data. RoBERTa is pretrained (frozen in the first stage, then fine-tuned with some layers in the second stage). The Word2Vec and RoBERTa vectors are concatenated and processed using Bidirectional Long Short-Term Memory (BiLSTM) with sigmoid activation. Training utilizes TensorFlow and the Adam optimizer, incorporating dropout and early stopping. The decision threshold is optimized during the validation process.

Result: The hybrid model achieved an accuracy of 88.09%, an F1-score of 88.09%, and an Area Under the Curve (AUC) ≈ 95.19% on the Receiver Operating Characteristic (ROC). No overfitting was observed, and the hybrid model outperformed both single baselines. The confusion matrix and ROC curve corroborate the findings.

Novelty: The novelty lies in the fusion of distributional and contextual representations with a structured fusion mechanism. Limitations: Computational requirements and hyperparameter tuning are not yet extensive. Further directions: Systematic hyperparameter search and cross-validation across other large sentiment datasets to assess generalization.

Author Biographies

Wahyu Hardyanto, Universitas Negeri Semarang

Physics Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia
Nila Prasetya Aryani, Universitas Negeri Semarang

Physics Study Program, Faculty of Mathematics and Natural Sciences,
Universitas Negeri Semarang, Indonesia
Defin Andestian, Universitas Negeri Semarang

Informatics Engineering Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia
Sugiyanto, Universitas Negeri Semarang

Physics Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia
Wahyu Setyaningrum, Universitas Negeri Semarang

Informatics Engineering Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia
M Fadil Mardiansyah, Universitas Negeri Semarang

Informatics Engineering Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia
Muhamad Anbiya Nur Islam, Universitas Negeri Semarang

Informatics Engineering Study Program, Faculty of Mathematics and Natural Sciences,
Universitas Negeri Semarang, Indonesia
Aji Purwinarko, Universitas Negeri Semarang

Informatics Engineering Study Program, Faculty of Mathematics and Natural Sciences,

Universitas Negeri Semarang, Indonesia

Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Article ID

Issue

Section

How to Cite

Main-Sidebar

Stat Counter