7th International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2025), Antalya, Türkiye, 31 Ekim - 02 Kasım 2025, ss.79, (Özet Bildiri)
This study extends transformer-based automated grading systems for open-ended text-based exam answers by incorporating a reinforcement learning (RL) mechanism to improve scoring accuracy and consistency. Previous work demonstrated the effectiveness of transformer models (e.g., BERT, RoBERTa, DistilBERT, T5) in generating vector-based similarity scores between student responses and reference solutions. However, these models exhibited limitations in adapting to intermediate score ranges and maintaining alignment with human evaluators. To address this limitation, we introduce a reinforcement learning–based calibration method for automated grading. Student responses are first scored on a 1–20 scale using transformer-based similarity, where anchor points (e.g., 4, 8, 12) define five reference categories supported by benchmark answers. For each prediction, the system compares the assigned score with the nearest anchor; if a deviation occurs (e.g., assigning 9 instead of 8), the RL agent adjusts the scoring policy through reward–penalty updates. Over successive iterations, the method reduces discrepancies and achieves closer alignment with instructor-defined grading patterns, while preserving the semantic similarity of the underlying transformer model. Preliminary analysis suggests that the proposed RL-enhanced calibration reduces systematic bias and improves robustness, particularly in borderline scoring cases where conventional similarity-based methods struggle. This approach advances the reliability of automated assessment systems by introducing an adaptive feedback-driven layer, enabling better generalization across diverse datasets and exam contexts.