Multilingual thematic modeling: A comparative study of classical and transformational approaches

Aizhan Nazyrova, Aikerim Nasrullayeva, Assel Mukanova, Aigerim Buribayeva, Banu Yergesh

Abstract

This study aims to conduct a comparative evaluation of classical and transformer-based sentiment analysis models applied to Kazakh-Russian bilingual texts, addressing the gap in resource-efficient NLP solutions for low-resource languages. Three models were implemented and evaluated: (1) Word2Vec with a two-layer neural network, (2) BERT (rubert-base-cased), and (3) DistilBERT (distilrubert-tiny). A balanced dataset of 226,000 bilingual comments was used. The models were compared using key performance indicators, including F1-score, accuracy, computational efficiency, inference speed, model size, and energy consumption. Results show that BERT achieved the highest accuracy (F1 = 0.90), but with significant computational and memory costs. DistilBERT provided nearly identical accuracy (F1 = 0.89) with substantially reduced resource requirements, while Word2Vec achieved lower accuracy (F1 = 0.81) but demonstrated superior speed and energy efficiency. Error analysis revealed consistent challenges across models in handling negation, sarcasm, idiomatic expressions, and code-mixed language. The findings confirm that lightweight transformer models, particularly DistilBERT, provide a favorable trade-off between accuracy and efficiency. Word2Vec remains a viable option for real-time and embedded applications, while BERT, although accurate, is less practical for resource-constrained environments. This study contributes to the advancement of Green AI principles by demonstrating how efficient sentiment analysis systems can be developed for low-resource languages. The proposed dataset and evaluation framework can serve as a benchmark for future Kazakh-Russian NLP research and practical applications, including mobile services, e-Government platforms, and education technologies.

Authors

Aizhan Nazyrova
Aikerim Nasrullayeva
nasrullayevaik@gmail.com (Primary Contact)
Assel Mukanova
Aigerim Buribayeva
Banu Yergesh
Nazyrova, A. ., Nasrullayeva, A. ., Mukanova, A. ., Buribayeva, A. ., & Yergesh, B. . (2025). Multilingual thematic modeling: A comparative study of classical and transformational approaches. International Journal of Innovative Research and Scientific Studies, 8(6), 2787–2799. https://doi.org/10.53894/ijirss.v8i6.10204

Article Details