Improving post-editing of Kazakh translations with fine-tuned large language models: Dataset and evaluation
Abstract
Machine translation for low-resource languages like Kazakh faces significant challenges due to limited training data, complex morphology, and cultural-linguistic nuances. This paper presents the first comprehensive study on fine-tuning large language models for automated post-editing of Kazakh translations. We introduce KazPE, a systematically annotated dataset containing 10,010 training sentences and 315 test sentences across six domains (medical, scientific, journalistic, oral, fiction, and legal) with detailed error categorization covering 11 linguistic dimensions. Our approach fine-tunes GPT-4.1-mini using supervised learning to improve translation quality through targeted error correction. Human evaluation demonstrates that our fine-tuned model achieves a mean quality score of 0.84 compared to 0.80 for the baseline, representing a 4% relative improvement. The most significant gains occur in morphological-lexical error handling and domain-specific contexts, with legal and medical texts showing improvements of +2.8% and +1.6% respectively. Error analysis reveals that fine-tuning effectively addresses Kazakh’s agglutinative morphology and specialized terminology while maintaining performance on error-free sentences. This work establishes the first systematic evaluation framework for Kazakh translation post-editing, providing valuable insights for improving machine translation systems for morphologically rich, low-resource languages. Our dataset, models, and evaluation framework are made publicly available to support future research in Turkic language processing.
Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.