AI-mediated pronunciation training: Vietnamese EFL learners' perceptions of ELSA speak

Vuong Thi Hai Yen, Nguyen Thi Thu Huyen

Abstract

This study examines how first-year English majors at Hanoi Metropolitan University, Vietnam perceive ELSA Speak as a pronunciation learning tool, with a focus on learner autonomy, technological affordances, and institutional constraints within mobile-assisted language learning (MALL). Using a convergent parallel mixed-method design, the study collected quantitative data from 110 participants through a structured survey (Cronbach's alpha = 0.87) and qualitative data from semi-structured interviews with 24 purposively selected participants. Data were analysed using SPSS 26.0, thematic analysis following Braun and Clarke [1] and NVivo 12, within a theoretical framework integrating the Technology Acceptance Model and Self-Determination Theory. 77.3% percent of the participants found ELSA Speak effective or highly effective for improving their pronunciation. Learners reported increased confidence (81.8%) and motivation (68.2%), and they considered instant phoneme feedback the most valuable feature (M = 4.2, SD = 0.7). Three primary constraints were consistently cited: inadequate contextual practice (forty point nine percent), the expense associated with premium functionalities (36.4%), and connectivity issues (31.8%). Qualitative examination identified accent bias within speech recognition as a prevalent concern, specifically impacting the precision of feedback for English spoken with a Vietnamese accent. These observations indicate that ELSA Speak facilitates pronunciation practice and learner autonomy, despite accent bias, an over-dependence on automated feedback, and a restricted emphasis on suprasegmental features representing notable limitations. Therefore, educators should use ELSA Speak in blended learning environments that combine AI tools with traditional teaching methods. At the same time, institutions should address accessibility issues related to cost and infrastructure.

Authors

Vuong Thi Hai Yen
vthyen@hnmu.edu.vn (Primary Contact)
Nguyen Thi Thu Huyen
Yen, V. T. H. ., & Huyen, N. T. T. . (2026). AI-mediated pronunciation training: Vietnamese EFL learners’ perceptions of ELSA speak. International Journal of Innovative Research and Scientific Studies, 9(3), 1–10. https://doi.org/10.53894/ijirss.v9i3.11309

Article Details