Comparative Analysis of High School Student and AI-Generated Essays Using IndoBERT and Linguistic Features
DOI:
https://doi.org/10.15294/sji.v12i3.27732Keywords:
ChatGPT, Essay, IndoBERT, Linguistic features, Semantic similarity, Text classificationAbstract
Purpose: The purpose of this study is to address the growing challenge of distinguishing between essays written by humans and essays generated by AI, particularly in the context of high school education in Indonesia. This study aims to analyze the semantic and linguistic differences between student-written and ChatGPT-generated in Indonesian language.
Methods: The study employs an IndoBERT-based semantic model trained with triplet loss to generate paragraph-level embeddings, allowing the measurement of semantic similarity within and between essay classes. Additionally, linguistic features such as lexical diversity, word count, modal usage, and stopword ratio were extracted to capture stylistic and structural differences. These three key features are combined and used as input to a neural network classifier.
Result: The IndoBERT-based semantic model successfully grouped student-written and ChatGPT-generated essays into distinct clusters. The similarity scores within student essays ranged from 0.7 to 0.9, while the similarity between classes was mostly negative with a few outliers, reflecting the cosine similarity metric used in this study, which has a range of -1 to 1. The classification model showed a 90.55% accuracy and an AUC of 0.9999 when evaluated on the independent test set defined in the Data Preparation stage. These results suggest that student-written and ChatGPT-generated essays form distinct semantic clusters. Students’ essays show more linguistic diversity, while ChatGPT essays show consistency in the coherence and formality aspects of the essays.
Novelty: This study provides empirical insights of semantic similarities and linguistic features to differentiate between human and AI-generated essays in the Indonesian language. It contributes to supporting academic integrity efforts and highlighting the need for further research across different writing models and contexts.
