Predicting University Entrance Examination Ranks by Developing a Stacking-Based Ensemble Machine Learning Algorithm

Document Type : Research Paper

Authors

Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran

Abstract

A key issue for planning and consulting is the accurate prediction of students’ rankings in important national university entrance exams, such as Iran’s nationwide university entrance examination, commonly known as the Konkur. Although machine learning has been increasingly used in educational data mining, most existing models have shown limited accuracy, are inadequately formulated, and lack sufficient optimization for practical application. This study introduces a novel stacking-based ensemble learning model that incorporates XGBoost, LightGBM, and CatBoost as base learners, with a linear regression model as a meta-learner to improve national rank prediction. The proposed model’s main hyperparameters were adjusted using the Optuna optimization framework to enhance the performance of each model. The model was trained and validated on a large dataset of over 73,000 student records from Ghalamchi Institute and evaluated using five-fold cross-validation with NRMSE and R² as performance measures. The results showed that the proposed model significantly outperformed baseline models, such as Random Forest, Gradient Boosting, and MLP Regressor, achieving NRMSE of 0.0659 and R² of 0.7735, which could be attributed to the effective integration of advanced learners with systematic hyperparameter optimization. This research provides a practical and scalable predictive tool that can support academic advisors, educators, and policymakers in making informed decisions, promoting equity in education, and guiding students through data-driven interventions. The use of stacking-based ensemble learning and automated hyperparameter optimization via Optuna distinguishes this study from prior research and is a meaningful step forward in the application of predictive analytics in high-risk educational settings.

Keywords

Main Subjects


Abiodun, O. J., & Wreford, A. I. (2024). Student’s performance evaluation using ensemble machine learning algorithms. Engineering and Technology Journal, 09(08). https://doi.org/10.47191/etj/v9i08.23
Aboneh, T., Rorissa, A., & Srinivasagan, R. (2022). Stacking-Based ensemble learning method for multi-spectral image classification. Technologies, 10(1), 17. https://doi.org/10.3390/technologies10010017
Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75. https://doi.org/10.1108/jarhe-09-2017-0113
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3292500.3330701
Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine, 18(3), 91–93. https://doi.org/10.1016/j.tjem.2018.08.001
Ali, R., Ali, S. K., & Afzal, A. (2019). Predictive validity of a Uniform Entrance Test for the health professionals. Pakistan Journal of Medical Sciences, 35(2). https://doi.org/10.12669/pjms.35.2.334
Almalawi, A., Soh, B., Li, A., & Samra, H. (2024). Predictive models for educational purposes: A systematic review. Big Data and Cognitive Computing, 8(12), 187. https://doi.org/10.3390/bdcc8120187
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1). https://doi.org/10.1186/s41239-020-0177-7
Balaji, P., Alelyani, S., Qahmash, A., & Mohana, M. (2021). Contributions of machine learning models towards student academic performance prediction: A systematic review. Applied Sciences, 11(21), 10007. https://doi.org/10.3390/app112110007
Ballaho, J. C. (2024). Predicting student's success in programming courses: A decision support system for admission in computer science and information technology programs. In 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) (pp. 60–64). Kota Kinabalu, Malaysia. https://doi.org/10.1109/IICAIET62352.2024.10729909
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2020). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Biau, G., & Cadre, B. (2021). Optimization by gradient boosting. In M. Lovric (Ed.), Springer eBooks (pp. 23–44). https://doi.org/10.1007/978-3-030-73249-3_2
Bishwakarma, S. T., & Sharma, G. (2022). Automated hyperparameter optimization in machine learning for stock prediction. 2022 Second International Conference on Next Generation Intelligent Systems (ICNGIS), 1-6. https://doi.org/10.1109/icngis54955.2022.10079816
Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information Knowledge and Management, 14, 045–076. https://doi.org/10.28945/4184
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Butt, N. A., Mahmood, Z., Shakeel, K., Alfarhood, S., Safran, M., & Ashraf, I. (2023). Performance prediction of students in higher education using multi-model ensemble approach. IEEE Access, 11, 136091–136108. https://doi.org/10.1109/access.2023.3336987
Cai, Y., Feng, J., Wang, Y., Ding, Y., Hu, Y., & Fang, H. (2024). The optuna–lightgbm–xgboost model: A novel approach for estimating carbon emissions based on the electricity–Carbon nexus. Applied Sciences, 14(11), 4632. https://doi.org/10.3390/app14114632
Chaparro-Cruz, I. N., Huertas-Condori, L. N., Cabana-Yupanqui, S. B., & Chaparro-Guerra, A. (2025). Relationship between entrance exam scores, academic performance, and student dropout rates: A longitudinal case study. International Journal of Learning, Teaching and Educational Research, 24(3), 216–243. https://doi.org/10.26803/ijlter.24.3.11
Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (pp. 785–794). https://doi.org/10.1145/2939672.2939785
Chen, X., Peng, Y., Gao, Y., & Cai, S. (2022). A competition model for prediction of admission scores of colleges and universities in Chinese college entrance examination. PLoS ONE, 17(10), e0274221. https://doi.org/10.1371/journal.pone.0274221
Collins, G. S., Dhiman, P., Ma, J., Schlussel, M. M., Archer, L., Calster, B. V., Harrell, F. E., Martin, G. P., Moons, K. G. M., Smeden, M. van, Sperrin, M., Bullock, G. S., & Riley, R. D. (2024). Evaluation of clinical prediction models (part 1): From development to external validation. BMJ, 384, e074819. https://doi.org/10.1136/bmj-2023-074819
Daniele, V. (2021). Socioeconomic inequality and regional disparities in educational achievement: The role of relative poverty. Intelligence, 84, 101515. https://doi.org/10.1016/j.intell.2020.101515
Dey, R., & Mathur, R. (2023). Ensemble learning method using stacking with base learner: A comparison. In Lecture Notes in Networks and Systems (Singapore) (pp. 159–169). https://doi.org/10.1007/978-981-99-3878-0_14
Gibson, D. C., & Webb, M. E. (2015). Data science in educational assessment. Education and Information Technologies, 20, 697-713. https://doi.org/10.1007/s10639-015-9411-7
Furkat, B., Nasimov, R., Rashidov, A., Akhmedov, F., & Cho, Y.-I. (2024). Effective methods of categorical data encoding for artificial intelligence algorithms. Mathematics, 12(16), 2553–2553. https://doi.org/10.3390/math12162553
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann. https://doi.org/10.1016/C2009-0-61819-5
Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017). Application of ensemble algorithm in students’ performance prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (Hamamatsu, Japan) (pp. 735–740).
Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 1–45. https://doi.org/10.1186/s40537-020-00369-8
Huang, B., & Wang, C. (2023). Research on data analysis of efficient innovation and entrepreneurship practice teaching based on LightGBM classification algorithm. International Journal of Computational Intelligence Systems, 16(1), 1–13. https://doi.org/10.1007/s44196-023-00324-4
Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2020). Systematic ensemble model selection approach for educational data mining. Knowledge-Based Systems, 200, 105992. https://doi.org/10.1016/j.knosys.2020.105992
Jafarnejad, A., Rezasoltani, A., Khani, A. M., & Sayedeh Hoda, H. (2025). A hybrid feature selection and classification framework for predicting entrepreneurial competency using machine learning and binary Grey Wolf Optimizer. Journal of Systems Thinking in Practice, 4(4), 129–155. https://doi.org/10.22067/jstinp.2025.94947.1172
Jafarnejad, A., Rezasoltani, A., & Khani, A. M. (2025). Cost-sensitive machine learning for predicting production defects: A novel approach based on MetaCost. Research in Production and Operations Management, 16(2), 73–94. https://doi.org/10.22108/pom.2025.144489.1610
Jafarnejad, A., Rezasoltani, A., & Khani, A. M. (2025). Predicting heart disease using automated machine learning based on genetic algorithms. Journal of Information Technology Management, 17(2), 91–122. https://doi.org/10.22059/jitm.2024.382556.3829
Kahlon, N. K., & Singh, W. (2024). Comparative analysis of web scraping tools for low-resource language text. International Journal of Engineering Trends and Technology, 72(1), 284–299. https://doi.org/10.14445/22315381/ijett-v72i1p128
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154. https://doi.org/10.5555/3294996.3295074
Khan, Z., Ali, A., Khan, D.M. et al. Regularized ensemble learning for prediction and risk factors assessment of students at risk in the post-COVID era. Sci Rep 14, 16200 (2024). https://doi.org/10.1038/s41598-024-66894-1
Khani, A. M., Kazazi, A., & Taqhavi Fard, M. T. (2022). Evaluating the quality of services of the cultural and social deputy of Tehran municipality in the field of culture and art. Social Development & Welfare Planning, 13(50), 205–250. https://doi.org/10.22054/qjsd.2021.58035.2110
Lazcano, A., Jaramillo-Morán, M. A., & Sandubete, J. E. (2024). Back to basics: The power of the multilayer perceptron in financial time series forecasting. Mathematics, 12(12), 1920. https://doi.org/10.3390/math12121920
Lee, T., Ullah, A., & Wang, R. (2019). Bootstrap aggregating and random forest. In Advanced Studies in Theoretical and Applied Econometrics (Cham, Switzerland) (pp. 389–429). https://doi.org/10.1007/978-3-030-31150-6_13
Massaoudi, M., Refaat, S. S., Chihi, I., Trabelsi, M., Oueslati, F. S., & Abu-Rub, H. (2020). A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy, 214, 118874. https://doi.org/10.1016/j.energy.2020.118874
Navarro, C. L. A., Damen, J. A., Takada, T., Nijman, S. W., Dhiman, P., Ma, J., ... & Hooft, L. (2021). Risk of bias in studies on prediction models developed using supervised machine learning techniques: Systematic review. bmj, 375. https://doi.org/10.1136/bmj.n2281
Parviz, M. Reflecting on the consequences of the Iranian university entrance examination: a systematic-narrative hybrid literature review. Discov Educ 2, 22 (2023). https://doi.org/10.1007/s44217-023-00046-x
Petro, L., & Pavlo, L. (2019). Grid search, random search, genetic algorithm: A big comparison for NAS. ArXiv. https://doi.org/10.48550/arXiv.1912.06059
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 6638–6648. https://doi.org/10.48550/arXiv.1706.09516
Rimal, Y., Sharma, N. & Alsadoon, A. The accuracy of machine learning models relies on hyperparameter tuning: student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms. Multimed Tools Appl 83, 74349–74364 (2024). https://doi.org/10.1007/s11042-024-18426-2
Rizkallah, L. W. (2025). Enhancing the performance of gradient boosting trees on regression problems. Journal of Big Data, 12, Article 35. https://doi.org/10.1186/s40537-025-01071-3
Sakri, S., & Saleh, A. (2020). RHEM: A robust hybrid ensemble model for students’ performance assessment on cloud computing course. International Journal of Advanced Computer Science and Applications, 11(11), 761–767. https://doi.org/10.14569/ijacsa.2020.0111150
Salari, M., Radfar, R., & Faghihi, M. (2024). Predicting students' performance using machine learning algorithms and educational data mining (A case study of Shahed University). Business Intelligence Management Studies, 12(47), 315-366. https://doi.org/10.22054/ims.2023.75523.2375
Salmanpoursohi, B., Daneshvar, A., Salmanpoursohi, S., Pourghader Chobar, A., & Salahi, F. (2024). Cancer detection from textual data using a combination of machine learning approach. Interdisciplinary Journal of Management Studies, 17(3), 1001–1014. https://doi.org/10.22059/ijms.2023.362252.676037
Saluja, R., Rai, M., & Saluja, R. (2023). Designing new student performance prediction model using ensemble machine learning. Journal of Autonomous Intelligence, 6(1), 583–583. https://doi.org/10.32629/jai.v6i1.583
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2, 160 (2021). https://doi.org/10.1007/s42979-021-00592-x
Srinivas, P., & Katarya, R. (2022). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456
Sukhija, N., & Faridi, M. (2024). Recommending graduate admission using ensemble model. In 2024 International Conference on Computational Intelligence and Computing Applications (ICCICA) (India) (pp. 526–530). https://doi.org/10.1109/iccica60014.2024.10584593
Taher Mazandarani, M., Zand, Z., Khodabandelou, M. H., Mozaffari, F., & Sohrabi, B. (2025). Predicting student academic performance: A machine learning approach and feature analysis. Interdisciplinary Journal of Management Studies, 18(3), 425–440. https://doi.org/10.22059/ijms.2025.362506.676053
Tang, B., Li, S., & Zhao, C. (2024). Predicting the performance of students using deep ensemble learning. Journal of Intelligence, 12(12), 124–124. https://doi.org/10.3390/jintelligence12120124
Teodorescu, V., & Obreja Brașoveanu, L. (2025). Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Computation13(5), 127. https://doi.org/10.3390/computation13050127
T r, M., V, V. K., V, D. K., Geman, O., Margala, M., & Guduri, M. (2023). The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. Healthcare Analytics, 4, 100247. https://doi.org/10.1016/j.health.2023.100247
Wang, N. Z., & Shi, N. Y. (2016). Prediction of the admission lines of college entrance examination based on machine learning. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC) (Chengdu, China) (pp. 332–335). https://doi.org/10.1109/compcomm.2016.7924718
Yağcı, M. (2022). Educational data mining: Prediction of students' academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11. https://doi.org/10.1186/s40561-022-00192-z
Yan, L., & Liu, Y. (2020). An ensemble prediction model for potential student recommendation using machine learning. Symmetry, 12(5), 728. https://doi.org/10.3390/sym12050728
Yang, H., Chen, Z., Yang, H., & Tian, M. (2023). Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison. IEEE Access, 11, 23366–23380. https://doi.org/10.1109/access.2023.3253885
Yu, J., Zhao, Y., Pan, R., Zhou, X., & Wei, Z. (2023). Prediction of the critical temperature of superconductors based on two-layer feature selection and the optuna-stacking ensemble learning model. ACS Omega, 8(3), 3078–3090. https://doi.org/10.1021/acsomega.2c06324       
Zangooei, H., & Fatemi, O. (2021). Predicting students at risk of academic failure using learning analytics in the learning management system. Quarterly of Iranian Distance Education Journal, 3(2), 32-44. https://doi.org/10.30473/idej.2022.63913.1104
Zhang, H. W., Wang, Y. R., Hu, B., et al. (2024). Using machine learning to develop a stacking ensemble learning model for the CT radiomics classification of brain metastases. Scientific Reports, 14, 28575. https://doi.org/10.1038/s41598-024-80210-x
Zohrehvandian, K., Ghaffarian, H., & Mahmoudi, A. (2023). Predicting the level of salesperson’s performance in encouraging customers to use appropriate shopping strategies in sports clubs. Interdisciplinary Journal of Management Studies, 17(1), 169–183. https://doi.org/10.22059/ijms.2023.342973.675100
Zub, K., Pavlo Zhezhnych, & Strauss, C. (2023). Two-Stage PNN–SVM ensemble for higher education admission prediction. Big Data and Cognitive Computing, 7(2), 83–83. https://doi.org/10.3390/bdcc7020083