Predicting University Entrance Examination Ranks by Developing a Stacking-Based Ensemble Machine Learning Algorithm

Mehregan, Mohammad Reza; Rezasoltani, Arman; Khani, Amir Mohammad

doi:10.22059/ijms.2025.385883.677186

Predicting University Entrance Examination Ranks by Developing a Stacking-Based Ensemble Machine Learning Algorithm

Document Type : Research Paper

Authors

Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran

10.22059/ijms.2025.385883.677186

Abstract

A key issue for planning and consulting is the accurate prediction of students’ rankings in important national university entrance exams, such as Iran’s nationwide university entrance examination, commonly known as the Konkur. Although machine learning has been increasingly used in educational data mining, most existing models have shown limited accuracy, are inadequately formulated, and lack sufficient optimization for practical application. This study introduces a novel stacking-based ensemble learning model that incorporates XGBoost, LightGBM, and CatBoost as base learners, with a linear regression model as a meta-learner to improve national rank prediction. The proposed model’s main hyperparameters were adjusted using the Optuna optimization framework to enhance the performance of each model. The model was trained and validated on a large dataset of over 73,000 student records from Ghalamchi Institute and evaluated using five-fold cross-validation with NRMSE and R² as performance measures. The results showed that the proposed model significantly outperformed baseline models, such as Random Forest, Gradient Boosting, and MLP Regressor, achieving NRMSE of 0.0659 and R² of 0.7735, which could be attributed to the effective integration of advanced learners with systematic hyperparameter optimization. This research provides a practical and scalable predictive tool that can support academic advisors, educators, and policymakers in making informed decisions, promoting equity in education, and guiding students through data-driven interventions. The use of stacking-based ensemble learning and automated hyperparameter optimization via Optuna distinguishes this study from prior research and is a meaningful step forward in the application of predictive analytics in high-risk educational settings.

Keywords

Main Subjects

Operations and information management

References

Abiodun, O. J., & Wreford, A. I. (2024). Student’s performance evaluation using ensemble machine learning algorithms. Engineering and Technology Journal, 09(08). https://doi.org/10.47191/etj/v9i08.23

Aboneh, T., Rorissa, A., & Srinivasagan, R. (2022). Stacking-Based ensemble learning method for multi-spectral image classification. Technologies, 10(1), 17. https://doi.org/10.3390/technologies10010017

Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75. https://doi.org/10.1108/jarhe-09-2017-0113

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3292500.3330701

Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine, 18(3), 91–93. https://doi.org/10.1016/j.tjem.2018.08.001

Ali, R., Ali, S. K., & Afzal, A. (2019). Predictive validity of a Uniform Entrance Test for the health professionals. Pakistan Journal of Medical Sciences, 35(2). https://doi.org/10.12669/pjms.35.2.334

Almalawi, A., Soh, B., Li, A., & Samra, H. (2024). Predictive models for educational purposes: A systematic review. Big Data and Cognitive Computing, 8(12), 187. https://doi.org/10.3390/bdcc8120187

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1). https://doi.org/10.1186/s41239-020-0177-7

Balaji, P., Alelyani, S., Qahmash, A., & Mohana, M. (2021). Contributions of machine learning models towards student academic performance prediction: A systematic review. Applied Sciences, 11(21), 10007. https://doi.org/10.3390/app112110007

Ballaho, J. C. (2024). Predicting student's success in programming courses: A decision support system for admission in computer science and information technology programs. In 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) (pp. 60–64). Kota Kinabalu, Malaysia. https://doi.org/10.1109/IICAIET62352.2024.10729909

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2020). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5

Biau, G., & Cadre, B. (2021). Optimization by gradient boosting. In M. Lovric (Ed.), Springer eBooks (pp. 23–44). https://doi.org/10.1007/978-3-030-73249-3_2

Bishwakarma, S. T., & Sharma, G. (2022). Automated hyperparameter optimization in machine learning for stock prediction. 2022 Second International Conference on Next Generation Intelligent Systems (ICNGIS), 1-6. https://doi.org/10.1109/icngis54955.2022.10079816

Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information Knowledge and Management, 14, 045–076. https://doi.org/10.28945/4184

Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324

Butt, N. A., Mahmood, Z., Shakeel, K., Alfarhood, S., Safran, M., & Ashraf, I. (2023). Performance prediction of students in higher education using multi-model ensemble approach. IEEE Access, 11, 136091–136108. https://doi.org/10.1109/access.2023.3336987

Cai, Y., Feng, J., Wang, Y., Ding, Y., Hu, Y., & Fang, H. (2024). The optuna–lightgbm–xgboost model: A novel approach for estimating carbon emissions based on the electricity–Carbon nexus. Applied Sciences, 14(11), 4632. https://doi.org/10.3390/app14114632

Chaparro-Cruz, I. N., Huertas-Condori, L. N., Cabana-Yupanqui, S. B., & Chaparro-Guerra, A. (2025). Relationship between entrance exam scores, academic performance, and student dropout rates: A longitudinal case study. International Journal of Learning, Teaching and Educational Research, 24(3), 216–243. https://doi.org/10.26803/ijlter.24.3.11

Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Chen, X., Peng, Y., Gao, Y., & Cai, S. (2022). A competition model for prediction of admission scores of colleges and universities in Chinese college entrance examination. PLoS ONE, 17(10), e0274221. https://doi.org/10.1371/journal.pone.0274221

Collins, G. S., Dhiman, P., Ma, J., Schlussel, M. M., Archer, L., Calster, B. V., Harrell, F. E., Martin, G. P., Moons, K. G. M., Smeden, M. van, Sperrin, M., Bullock, G. S., & Riley, R. D. (2024). Evaluation of clinical prediction models (part 1): From development to external validation. BMJ, 384, e074819. https://doi.org/10.1136/bmj-2023-074819

Daniele, V. (2021). Socioeconomic inequality and regional disparities in educational achievement: The role of relative poverty. Intelligence, 84, 101515. https://doi.org/10.1016/j.intell.2020.101515

Dey, R., & Mathur, R. (2023). Ensemble learning method using stacking with base learner: A comparison. In Lecture Notes in Networks and Systems (Singapore) (pp. 159–169). https://doi.org/10.1007/978-981-99-3878-0_14

Gibson, D. C., & Webb, M. E. (2015). Data science in educational assessment. Education and Information Technologies, 20, 697-713. https://doi.org/10.1007/s10639-015-9411-7

Furkat, B., Nasimov, R., Rashidov, A., Akhmedov, F., & Cho, Y.-I. (2024). Effective methods of categorical data encoding for artificial intelligence algorithms. Mathematics, 12(16), 2553–2553. https://doi.org/10.3390/math12162553

Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann. https://doi.org/10.1016/C2009-0-61819-5

Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017). Application of ensemble algorithm in students’ performance prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (Hamamatsu, Japan) (pp. 735–740).

Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 1–45. https://doi.org/10.1186/s40537-020-00369-8

Huang, B., & Wang, C. (2023). Research on data analysis of efficient innovation and entrepreneurship practice teaching based on LightGBM classification algorithm. International Journal of Computational Intelligence Systems, 16(1), 1–13. https://doi.org/10.1007/s44196-023-00324-4

Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2020). Systematic ensemble model selection approach for educational data mining. Knowledge-Based Systems, 200, 105992. https://doi.org/10.1016/j.knosys.2020.105992

Jafarnejad, A., Rezasoltani, A., Khani, A. M., & Sayedeh Hoda, H. (2025). A hybrid feature selection and classification framework for predicting entrepreneurial competency using machine learning and binary Grey Wolf Optimizer. Journal of Systems Thinking in Practice, 4(4), 129–155. https://doi.org/10.22067/jstinp.2025.94947.1172

Jafarnejad, A., Rezasoltani, A., & Khani, A. M. (2025). Cost-sensitive machine learning for predicting production defects: A novel approach based on MetaCost. Research in Production and Operations Management, 16(2), 73–94. https://doi.org/10.22108/pom.2025.144489.1610

Jafarnejad, A., Rezasoltani, A., & Khani, A. M. (2025). Predicting heart disease using automated machine learning based on genetic algorithms. Journal of Information Technology Management, 17(2), 91–122. https://doi.org/10.22059/jitm.2024.382556.3829

Kahlon, N. K., & Singh, W. (2024). Comparative analysis of web scraping tools for low-resource language text. International Journal of Engineering Trends and Technology, 72(1), 284–299. https://doi.org/10.14445/22315381/ijett-v72i1p128

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154. https://doi.org/10.5555/3294996.3295074

Khan, Z., Ali, A., Khan, D.M. et al. Regularized ensemble learning for prediction and risk factors assessment of students at risk in the post-COVID era. Sci Rep 14, 16200 (2024). https://doi.org/10.1038/s41598-024-66894-1

Khani, A. M., Kazazi, A., & Taqhavi Fard, M. T. (2022). Evaluating the quality of services of the cultural and social deputy of Tehran municipality in the field of culture and art. Social Development & Welfare Planning, 13(50), 205–250. https://doi.org/10.22054/qjsd.2021.58035.2110

Lazcano, A., Jaramillo-Morán, M. A., & Sandubete, J. E. (2024). Back to basics: The power of the multilayer perceptron in financial time series forecasting. Mathematics, 12(12), 1920. https://doi.org/10.3390/math12121920

Lee, T., Ullah, A., & Wang, R. (2019). Bootstrap aggregating and random forest. In Advanced Studies in Theoretical and Applied Econometrics (Cham, Switzerland) (pp. 389–429). https://doi.org/10.1007/978-3-030-31150-6_13

Massaoudi, M., Refaat, S. S., Chihi, I., Trabelsi, M., Oueslati, F. S., & Abu-Rub, H. (2020). A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy, 214, 118874. https://doi.org/10.1016/j.energy.2020.118874

Navarro, C. L. A., Damen, J. A., Takada, T., Nijman, S. W., Dhiman, P., Ma, J., ... & Hooft, L. (2021). Risk of bias in studies on prediction models developed using supervised machine learning techniques: Systematic review. bmj, 375. https://doi.org/10.1136/bmj.n2281

Parviz, M. Reflecting on the consequences of the Iranian university entrance examination: a systematic-narrative hybrid literature review. Discov Educ 2, 22 (2023). https://doi.org/10.1007/s44217-023-00046-x

Petro, L., & Pavlo, L. (2019). Grid search, random search, genetic algorithm: A big comparison for NAS. ArXiv. https://doi.org/10.48550/arXiv.1912.06059

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 6638–6648. https://doi.org/10.48550/arXiv.1706.09516

Rimal, Y., Sharma, N. & Alsadoon, A. The accuracy of machine learning models relies on hyperparameter tuning: student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms. Multimed Tools Appl 83, 74349–74364 (2024). https://doi.org/10.1007/s11042-024-18426-2

Rizkallah, L. W. (2025). Enhancing the performance of gradient boosting trees on regression problems. Journal of Big Data, 12, Article 35. https://doi.org/10.1186/s40537-025-01071-3

Sakri, S., & Saleh, A. (2020). RHEM: A robust hybrid ensemble model for students’ performance assessment on cloud computing course. International Journal of Advanced Computer Science and Applications, 11(11), 761–767. https://doi.org/10.14569/ijacsa.2020.0111150

Salari, M., Radfar, R., & Faghihi, M. (2024). Predicting students' performance using machine learning algorithms and educational data mining (A case study of Shahed University). Business Intelligence Management Studies, 12(47), 315-366. https://doi.org/10.22054/ims.2023.75523.2375

Salmanpoursohi, B., Daneshvar, A., Salmanpoursohi, S., Pourghader Chobar, A., & Salahi, F. (2024). Cancer detection from textual data using a combination of machine learning approach. Interdisciplinary Journal of Management Studies, 17(3), 1001–1014. https://doi.org/10.22059/ijms.2023.362252.676037

Saluja, R., Rai, M., & Saluja, R. (2023). Designing new student performance prediction model using ensemble machine learning. Journal of Autonomous Intelligence, 6(1), 583–583. https://doi.org/10.32629/jai.v6i1.583

Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2, 160 (2021). https://doi.org/10.1007/s42979-021-00592-x

Srinivas, P., & Katarya, R. (2022). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456

Sukhija, N., & Faridi, M. (2024). Recommending graduate admission using ensemble model. In 2024 International Conference on Computational Intelligence and Computing Applications (ICCICA) (India) (pp. 526–530). https://doi.org/10.1109/iccica60014.2024.10584593

Taher Mazandarani, M., Zand, Z., Khodabandelou, M. H., Mozaffari, F., & Sohrabi, B. (2025). Predicting student academic performance: A machine learning approach and feature analysis. Interdisciplinary Journal of Management Studies, 18(3), 425–440. https://doi.org/10.22059/ijms.2025.362506.676053

Tang, B., Li, S., & Zhao, C. (2024). Predicting the performance of students using deep ensemble learning. Journal of Intelligence, 12(12), 124–124. https://doi.org/10.3390/jintelligence12120124

Teodorescu, V., & Obreja Brașoveanu, L. (2025). Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Computation, 13(5), 127. https://doi.org/10.3390/computation13050127

T r, M., V, V. K., V, D. K., Geman, O., Margala, M., & Guduri, M. (2023). The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. Healthcare Analytics, 4, 100247. https://doi.org/10.1016/j.health.2023.100247

Wang, N. Z., & Shi, N. Y. (2016). Prediction of the admission lines of college entrance examination based on machine learning. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC) (Chengdu, China) (pp. 332–335). https://doi.org/10.1109/compcomm.2016.7924718

Yağcı, M. (2022). Educational data mining: Prediction of students' academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11. https://doi.org/10.1186/s40561-022-00192-z

Yan, L., & Liu, Y. (2020). An ensemble prediction model for potential student recommendation using machine learning. Symmetry, 12(5), 728. https://doi.org/10.3390/sym12050728

Yang, H., Chen, Z., Yang, H., & Tian, M. (2023). Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison. IEEE Access, 11, 23366–23380. https://doi.org/10.1109/access.2023.3253885

Yu, J., Zhao, Y., Pan, R., Zhou, X., & Wei, Z. (2023). Prediction of the critical temperature of superconductors based on two-layer feature selection and the optuna-stacking ensemble learning model. ACS Omega, 8(3), 3078–3090. https://doi.org/10.1021/acsomega.2c06324

Zangooei, H., & Fatemi, O. (2021). Predicting students at risk of academic failure using learning analytics in the learning management system. Quarterly of Iranian Distance Education Journal, 3(2), 32-44. https://doi.org/10.30473/idej.2022.63913.1104

Zhang, H. W., Wang, Y. R., Hu, B., et al. (2024). Using machine learning to develop a stacking ensemble learning model for the CT radiomics classification of brain metastases. Scientific Reports, 14, 28575. https://doi.org/10.1038/s41598-024-80210-x

Zohrehvandian, K., Ghaffarian, H., & Mahmoudi, A. (2023). Predicting the level of salesperson’s performance in encouraging customers to use appropriate shopping strategies in sports clubs. Interdisciplinary Journal of Management Studies, 17(1), 169–183. https://doi.org/10.22059/ijms.2023.342973.675100

Zub, K., Pavlo Zhezhnych, & Strauss, C. (2023). Two-Stage PNN–SVM ensemble for higher education admission prediction. Big Data and Cognitive Computing, 7(2), 83–83. https://doi.org/10.3390/bdcc7020083

Interdisciplinary Journal of Management Studies

Article View: 360
PDF Download: 458

Predicting University Entrance Examination Ranks by Developing a Stacking-Based Ensemble Machine Learning Algorithm

References

Volume 19, Issue 3
July 2026
Pages 545-566

Files

Share

How to cite

Statistics

Predicting University Entrance Examination Ranks by Developing a Stacking-Based Ensemble Machine Learning Algorithm

References

Volume 19, Issue 3July 2026Pages 545-566

Files

Share

How to cite

Statistics

Volume 19, Issue 3
July 2026
Pages 545-566