An explainable hybrid machine learning model for data-driven assessment and enhancement of program learning outcomes in higher education
Abstract
The purpose of this study is to develop and validate an explainable hybrid machine learning framework for accurately assessing and enhancing Program Learning Outcomes (PLOs) in higher education. The study aims to overcome the limitations of conventional manual or heuristic evaluation methods by leveraging data-driven predictive analytics to identify key factors influencing student achievement. A stacked ensemble learning architecture is proposed, combining multiple gradient boosting and tree-based algorithms: LightGBM, XGBoost, CatBoost, Gradient Boosting, and Decision Tree, under a multinomial Logistic Regression meta-learner. The model was trained and tested on real academic data collected from the University of Tabuk, Saudi Arabia, incorporating academic, behavioral, and demographic variables. Comprehensive preprocessing, stratified k fold cross validation, and grid search optimization were applied to enhance robustness and generalization. SHapley Additive exPlanations (SHAP) were used to interpret model outputs and determine the relative importance of predictors. The hybrid model achieved a micro average ROC AUC of 0.998, along with consistently high precision, recall, and F1 scores across all grade categories (A–F). SHAP analysis revealed that Total Score, Project Score, and Final Score were the strongest predictors of PLO attainment, offering a clear insight into the learning dimensions that contribute most to academic success. Results confirm that the proposed hybrid ensemble outperforms conventional single model and deep learning approaches in both predictive precision and interpretability. By combining accuracy with transparency, the model serves as a valid analytical tool for institutional quality assurance and outcome based education. This framework enables educators and program evaluators to make data driven, evidence based decisions for early identification of at risk students, curriculum refinement, and continuous improvement of teaching strategies. It also provides a replicable methodology for integrating explainable AI into academic performance assessment in higher education institutions.
Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.