Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch

Authors

  • Aji Purwinarko Author

DOI:

https://doi.org/10.15294/sji.v12i4.35918

Keywords:

Analysis Sentiment, RoBERTa, Word2vec, BiLSTM

Abstract

Purpose: We improve the accuracy of Twitter sentiment analysis with a hybrid model combining Word to Vector (Word2Vec) and the Robustly Optimized BERT Pretraining Approach (RoBERTa). The idea is that Word2Vec is strong for slang/novel vocabulary (distributional semantics), while RoBERTa excels in contextual meaning; combining the two mitigates each other's weaknesses.

Methods/Study design/approach: The Sentiment140 dataset contains 1.6 million balanced tweets. The split is stratified; Word2Vec is trained solely on the training data. RoBERTa is pretrained (frozen in the first stage, then fine-tuned with some layers in the second stage). The Word2Vec and RoBERTa vectors are concatenated and processed using Bidirectional Long Short-Term Memory (BiLSTM) with sigmoid activation. Training utilizes TensorFlow and the Adam optimizer, incorporating dropout and early stopping. The decision threshold is optimized during the validation process. The process supports caching and training resumes.

Result/Findings: The hybrid model achieved an accuracy of 88.09%, an F1-score of 88.09

%, and an Area Under the Curve (AUC) ≈ 95.19% on the Receiver Operating Characteristic (ROC). No overfitting was observed, and the hybrid model outperformed both single baselines. The confusion matrix and ROC curve corroborate the findings.

Novelty/Originality/Value: The novelty lies in the fusion of distributional and contextual representations with resource-efficient fine-tuning. Limitations: Computational requirements and hyperparameter tuning are not yet extensive. Further directions: systematic hyperparameter search and cross-validation across other large sentiment datasets to assess generalization.

Published

16-01-2026

Article ID

35918

Issue

Section

Articles

How to Cite

Improving Sentiment Analysis with a Context-Aware RoBERTa–BiLSTM and Word2Vec Branch. (2026). Scientific Journal of Informatics, 12(4). https://doi.org/10.15294/sji.v12i4.35918