Can We Trust AI to Assess Writing? An Analysis of Scoring Reliability and Feedback Consistency

Authors

  • Fitriani Fitriani STAIN Mandailing Natal Author
  • Puput Zuli Eko Rini Universitas PGRI Mpu Sindok Author

DOI:

https://doi.org/10.15294/jpk.v11i1.25849

Keywords:

AI-generated writing assessment; scoring reliability; feedback consistency

Abstract

This study analyzes AI-generated writing assessments' scoring reliability and feedback consistency using ChatGPT. Adopting a mixed-methods approach, 23 student descriptive texts were evaluated across three assessment rounds. Quantitative findings showed high scoring reliability, with an Intraclass Correlation Coefficient (ICC) of 0.93, indicating excellent consistency across repeated evaluations. Qualitative analysis revealed that ChatGPT consistently addressed five core writing criteria—content, organization, vocabulary, language use, and mechanics. However, the feedback varied in focus and detail across rounds, and the absence of reference to prior feedback limited its support for revision as a recursive process. The findings suggest that although ChatGPT demonstrates reliable scoring and generally stable feedback themes, it lacks the continuity to facilitate sustained writing development. To enhance its pedagogical value, AI-based feedback systems should be designed to build upon previous responses, thereby enabling more effective support for students' progressive improvement in writing.

Downloads

Published

2025-06-30

Article ID

25849

Issue

Section

Articles