Providing a Hybrid Clustering Method as an Auxiliary System in Automatic Labeling to Divide Employee Into Different Levels of Productivity and Their Retention

Document Type : Research Paper


School of Industrial Engineering, Iran University of Science & Technology, Tehran, Iran


Identifying productive employees and analyzing their turnover by data mining tools without human intervention is an attractive research field in human resource management. This study develops an innovative auxiliary system for automatic labeling of numerical data by providing a hybrid clustering algorithm of K-means and partition around medoids (PAM) methods to identify organizational productive employees and to divide them into different productivity levels. The model is evaluated by calculating the differences between actual and labeled values (93% labeling accuracy) and an innovative criterion for image processing of the final clusters using the singular value decomposition (SVD) algorithm. Ultimately, the results of the algorithm determine four labels of middle and good productive employees who leave the organization and excellent and weak productive employees who stay in the organization; according to each cluster, policies are adopted for their retaining, productivity improvement, and replacement.


Main Subjects

Article Title [فارسی]

ارائه یک روش خوشه‌بندی ترکیبی بعنوان سیستمی کمکی در برچسب‌گذاری خودکار برای تقسیم‌ کارکنان سازمان به سطوح مختلف بهره‌وری و حفظ آن‌ها

Authors [فارسی]

  • سید علیرضا موسویان انارکی
  • عبدالرحمن حائری
  • فاطمه مصلحی
دانشجوی کارشناسی ارشد، دانشکده مهندسی صنایع، دانشگاه علم و صنعت ایران، تهران، ایران
Abstract [فارسی]

شناسایی کارمندان بهره‌ور و تجزیه و تحلیل ریزش آن‌ها با استفاده از ابزارهای داده‌کاوی بدون دخالت انسانی از زمینه‌های تحقیقاتی جذاب در مدیریت منابع انسانی است. این پژوهش با ارائه یک روش خوشه‌بندی ترکیبی از  K-means و PAM یک سیستم کمکی نوآورانه برای برچسب‌گذاری خودکار بر روی داده‌های عددی را ایجاد کرده است بگونه‌ای که به شناسایی کارکنان بهره‌ور سازمانی و تقسیم‌بندی آن‌ها به سطوح مختلف بهره‌وری و بررسی ریزش آن‌ها می‌پردازد. برای ارزیابی نتایج از بررسی اختلافات مقادیر واقعی و برچسب‌خورده (93 درصد دقت برچسب‌گذاری) و معیار نوآورانه پردازش تصویر خوشه‌های نهایی با استفاده از  SVD استفاده شده است. نتایج حاصل از اجرای الگوریتم منجر به برچسب‌گذاری 4 خوشه تحت برچسب‌های کارمندان بهره‌ور متوسط و خوب ترک‌کننده و کارمندان بهره‌ور عالی و ضعیف باقی‌مانده در سازمان می‌شود که متناسب با هریک از خوشه‌ها سیاست‌های مربوط به حفظ، بهبود بهره‌وری و جایگزینی اتخاذ می‌شود.

Keywords [فارسی]

  • کارمندان بهره‌ور
  • ریزش کارکنان
  • خوشه‌بندی ترکیبی
  • برچسب‌گذاری خودکار
  • پردازش تصویر
Abtahi, S., & Kazemi, B. (2001). Productivity (3rd ed.). Institute For Trade Studies and Research (In Persian).
Ajit, P. (2016). Prediction of employee turnover in organizations using machine learning algorithms. Algorithms, 4(5), 22-26  .
Aker, A., Paramita, M. L., Kurtić, E., Funk, A., Barker, E., Hepple, M., & Gaizauskas, R. (2016). Automatic label generation for news comment clusters. In Isard, A., Rieser, V., & Gkatzia, D. (Eds.), Proceedings of 9th International Natural Language Generation Conference (INLG 2016),   (pp. 61-69).   Edinburgh, UK: Association for Computational Linguistics.
Al-Emadi, A. A. Q., Schwabenland, C., & Wei, Q. (2015). The vital role of employee retention in human resource management: A literature review. IUP Journal of Organizational Behavior, 14(3),  -32.
Alhmoud, A., & Rjoub, H. (2019). Total rewards and employee retention in a Middle Eastern context. SAGE Open, 9(2), 2158244019840118.
Amiri, S., Clarke, B. S., Clarke, J. L., & Koepke, H. (2019). A general hybrid clustering technique. Journal of Computational and Graphical Statistics, 28(3), 540-551.
Anaraki, S. A. M., Haeri, A., & Moslehi, F. (2021). A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability. Pattern Analysis and Applications, 24(3)  , 1387-1402.
Azar, A., Ahmadi, P., & Sabt, M. V. (2010). Model design for personnel selection with data mining approach (Case study: A Commerce Bank of Iran). Journal of Information Technology Management, 2(4), 3-22  . (In Persian).
Berman, E. M., & Berman, E. M. (1998). Productivity in public and nonprofit organizations: Strategies and techniques. Sage.
Bertschek, I., Fryges, H., & Kaiser, U. (2006). B2B or Not to Be: Does B2B e‐commerce increase labour productivity? International Journal of the Economics of Business13(3), 387-405.
Bridges, R. A., Jones, C. L., Iannacone, M. D., Testa, K. M., & Goodall, J. R. (2013). Automatic labeling for entity extraction in cyber security. Aabs/1308.4941 .
Cao, F., & Liang, J. (2011). A data labeling method for clustering categorical data. Expert Systems With Applications, 38(3), 2381-2385.
Cao, L. (2006). Singular value decomposition applied to digital image processing. Division of Computing Studies, Arizona State University Polytechnic Campus, Mesa, Arizona State University Polytechnic Campus, 1-15.  
Celebi, M. E. (2014). Partitional clustering algorithms. Springer.
Chalkiti, K., & Sigala, M. (2010). Staff turnover in the Greek tourism industry: A comparison between insular and peninsular regions. International Journal of Contemporary Hospitality Management, 22(3), 335-359.
Chanodkar, A., Changle, R., & Mamtani, D. (2020). Prediction of employee turnover in organizations using machine learning algorithms. Prestige International Journal of Management and Research, 12(1/2), 222-226.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0: Step-by-step data mining guide. SPSS Inc.
Chen, H. L., Chuang, K. T., & Chen, M. S. (2005, November). Labeling unclustered categorical data into clusters based on the important attribute values. In Han, J., Wah, B. W., Raghavan V., Wu, X., & Rastogi, R. (Eds.),   Fifth IEEE international conference on data mining (ICDM’05) (pp. 1- ). Houston, TX, USA:   IEEE.
Clarke, R. L. (1991). The measurement of physical distribution productivity: South Carolina, a case in point. Transportation Journal, 31(1)  , 14-21.
Das, B. L., & Baruah, M. (2013). Employee retention: A review of literature. Journal of Business and Management, 14(2), 8-16.
Datta, S., & Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19(4), 459-466.
de Abreu Lopes, P., & de Arruda Camargo, H. (2012, August). Automatic labeling by means of semi-supervised fuzzy clustering as a boosting mechanism in the generation of fuzzy rules. In    Zhang, C., Joshi, J., Bertino, E., Thuraisingham, B. (Eds.), 2012 IEEE 13th International conference on information reuse & integration (IRI) (pp. 279-286).   Las Vegas, NV, USA: IEEE.
Diah, A. M., Hasiara, R. L. O., & Irwan, M. (2020). Employee retention of pharmaceutical firms in Indonesia: Taking investment in employee development and social and economic exchange as predictors. Systematic Reviews in Pharmacy, 11(1), 564-572.
Drias, H., Cherif, N. F., & Kechid, A. (2016). K-MM: A hybrid clustering algorithm based on k-means and k-medoids. In   Pillay N., Engelbrecht A., Abraham A., du Plessis M., Snášel V., & Muda A. (Eds.), Advances in nature and biologically inspired computing (pp. 37-48). Cham: Springer.
Drias, H., Kechid, A., & Fodil-Cherif, N. (2016). A hybrid clustering algorithm and web information foraging. International Journal of Hybrid Intelligent Systems, 13(3-4), 137-149.
Ebadati, E. O. M., & Tabrizi, M. M. (2016). A hybrid clustering technique to improve big data accessibility based on machine learning approaches. In   Satapathy S., Mandal J., Udgata S., & Bhateja V. (Eds.), Information Systems Design and Intelligent Applications (pp. 413-423). New Delhi: Springer.
Elsafty, A. S., & Ragheb, M. (2020). The role of human resource management towards employees retention during Covid-19 pandemic in medical supplies sector - Egypt. Business and Management Studies, 6(2),  .
Esmaeilzadeh, M., Abdollahi, B., Ganjali, A., & Hasanpoor, A. (2016). Evaluation of employee profiles using a hybrid clustering and optimization model. International Journal of Intelligent Computing and Cybernetics, 9(3), 218-236.  
Eyster, L. (2008). Current strategies to employ and retain older workers. Urban Institute.
Faghihi, A., & Mousavi kashi, Z. (2010). A model for the measurement of productivity in the public sector of Iran. Journal of Public Administration, 2(4), 107-126  . (In Persian).
Fan, C. Y., Fan, P. S., Chan, T. Y., & Chang, S. H. (2012). Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Systems with Applications, 39(10), 8844-8851.
Fang, C.-H., Chang, S.-T., & Chen, G.-L. (2009). applying structural equation model to study of the relationship model among leadership style, satisfaction, organization commitment and performance in hospital industry [paper presentation]. The 2009 international conference on e-business and information system security,   Wuhan, China: IEEE.
Friedman, T. L. (2006). The world is flat: The globalized world in the twenty-first century. Penguin London.
Giri, A., Gangopadhyay, S., Majumder, J., & Paul, P. (2019). model development for employee retention in indian construction industry using structural equation modeling (SEM). International Journal of Civil Engineering and Technology, 10(04), 196-204  .
Hajiheydari, N., Khabiri, S. H., & Talafi Daryani, M. (2017). A framework for data mining approach applications in human resource management. Iranian Journal of Management Sciences, 12(47), 21-50. (In Persian).
Handy, L. A. W. (2008). The importance of the work environment variables on the transfer of training: North Carolina State University.
He, Z., Zhang, R., & Wu, D. (2010). Evaluation of resource-saving and environment-friendly agriculture development status based on hybrid clustering. In Zhang, J., Xu, L., Zhang, X., & Yi, P. (Eds.),   ICLEM 2010: Logistics For Sustained Economic Development: Infrastructure, Information, Integration (pp. 1962-1971).   Chengdu, China: ASCE.
Hoecker, A., & Kartvelishvili, V. (1995). SVD approach to data unfolding. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 372(3), 469-481 
Hyman, J., & Summers, J. (2004). Lacking balance? Work‐life employment practices in the modern economy. Personnel Review, 33(4), 418-429  .
Ilgen, D. R., & Klein, H. J. (1988). Individual motivation and performance: Cognitive influences on effort and choice. In J. P. Campbell, R. J. Campbell, et al. (Eds.), Productivity in Organizations: New Perspectives from Industrial and Organizational Psychology, (pp. 143-176), London: Jossey-Bass.
Ilmakunnas, P., Maliranta, M., & Vainiomäki, J. (2005). Worker turnover and productivity growth. Applied Economics Letters, 12(7), 395-398.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264-323.
Jung, W. S., Lim, K. W., Ko, Y. B., & Park, S. J. (2009, February). A hybrid approach for clustering-based data aggregation in wireless sensor networks. In Takahashi, Y., Berntzen, L., & Smedberg, Å. (Eds.),   The third international conference on digital society (pp. 112-117).   Cancun, Mexico: IEEE.
Kaliprasad, M. (2006). The human factor I: Attracting, retaining, and motivating capable people. Cost Engineering, 48(6),  -26.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
Khalid, K., & Nawab, S. (2018). Employee participation and employee retention in view of compensation. SAGE Open, 8(4), 2158244018810067.
Kumar, R., & Arora, R. (2012). Determinants of talent retention in BPO industry. Indian Journal of Industrial Relations, 48(2)  259-273.
Kundu, S. C., & Lata, K. (2017). Effects of supportive work environment on employee retention: Mediating role of organizational engagement. International Journal of Organizational Analysis, 25(4), 703-722  .
Kurdi, B., & Alshurideh, M. (2020). Employee retention and organizational performance: Evidence from banking industry. Management Science Letters, 10(16), 3981-3990.
Kusumaningrum, R. (2017, January). An automatic labeling of K-means clusters based on chi-square value. Journal of Physics: Conference Series, 801, 012071.
Li, F., Qiao, H., & Zhang, B. (2018). Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recognition, 83, 161-173.
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In   Le Cam, L. M., & Neyman, J. (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (vol. 1, no. 14, pp. 281-297),   California: University of California Press.
Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Orallo, J. H., Kull, M., Lachiche, N., … & Flach, P. A. (2019). CRISP-DM twenty years later: From data mining processes to data science trajectories [paper presentation]. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3048-3061.  
Mei, Q., Shen, X., & Zhai, C. (2007, August). Automatic labeling of multinomial topic models. In Berkhin, P., Caruana, R., Wu, X. (Eds.),   Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 490-499).   New York, NY, USA: Association for Computing Machinery.
Moncarz, E., Zhao, J., & Kay, C. (2009). An exploratory study of US lodging properties’ organizational practices on employee turnover and retention. International Journal of Contemporary Hospitality Management, 21(4), 437-458  .
Moslehi, F., Haeri, A., & Gholamian, M. R. (2019). Investigation of effective factors in expanding electronic payment in Iran using data mining techniques. Journal of Industrial and Systems Engineering, 12(2), 61-94.
Moslehi, F., Haeri, A. R., & Moini, A. R. (2018). Analyzing and investigating the use of electronic payment tools in Iran using data mining techniques. Journal of AI and Data Mining, 6(2), 417-437.
Murugavel, P., & Punithavalli, M. (2011). Improved hybrid clustering and distance-based technique for outlier removal. International Journal on Computer Science and Engineering (IJCSE), 3(1), 333-339.
Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592-2602.
Noah, Y. (2008). A study of worker participation in management decision making within selected establishments in Lagos, Nigeria. Journal of Social Sciences, 17(1), 31-39.
Okoye, P., & Ezejiofor, R. A. (2013). The effect of human resources development on organizational productivity. International Journal of Academic Research in Business and Social Sciences, 3(10),  -268.
Patgar, S., & Kumar, V. (2015). A study on the factors affecting employee retention in a textile industry. International Journal of Recent Research in Civil and Mechanical Engineering, 1(2), 1-5.
Piatetsky, G. (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDD News.
Prichard, R. (1990). Measuring and improving organizational productivity. Praeger Publishers.
Pritchard, R. D. (1992). Organizational productivity. In   Dunnette, M. D., & Hough, L. M. (Eds.), Handbook of industrial and organizational psychology (vol. 3, pp. 443-471).   Palo Alto, CA: Consulting Psychologists Press.
Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.
Qi, Y., Kuksa, P., Collobert, R., Sadamasa, K., Kavukcuoglu, K., & Weston, J. (2009, December). Semi-supervised sequence labeling with self-learned features. In Wang, W., Kargupta, H., Ranka, S., Yu, P. S., & Wu, X. (Eds.),   The ninth IEEE international conference on data mining (pp. 428-437).   Miami Beach, FL, USA: IEEE.
Rahman, M. A., & Islam, M. Z. (2014). A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowledge-Based Systems, 71, 345-365.
Rahmati, M. H., Hosseini Fard, S. M., & Alimadadi, A. (2014). Investigating effectiveness of in-service training in the public sector. Iranian Journal of Management Studies, 7(2), 305-327.
Ramlall, S. (2003). Managing employee retention as a strategy for increasing organizational competitiveness: Organizational Application. Applied HRM Research, 8(2), 63-72.
Ramlall, S. (2004). A review of employee motivation theories and their implications for employee retention within organizations. Journal of American Academy of Business, 5(1/2), 52-63.
Rao, D. K. S., Sahyaja, C., Akhil, P., & Narasimha, N. L. (2018). Role of leadership on employee retention–A study on corporate hospitals. International Journal of Mechanical Engineering and Technology, 9(2), 161-172.
Reddy, H. V., Agrawal, P., & Raju, S. V. (2013, August). Data labeling method based on cluster purity using relative rough entropy for categorical data clustering. In Manjunath, S., Kumar, V., & Nagasundara, K. B. (Eds.),   The 2013 international conference on advances in computing, communications and informatics (ICACCI) (pp. 500-506). Mysore, India: IEEE  .
Ribes, E., Touahri, K., & Perthame, B. (2017). Employee turnover prediction and retention policies design: A case study. arXiv e-prints, 
Roiger, R. J. (2017). Data mining: A tutorial-based primer. CRC press.
Rosenblatt, Z., & Ruvio, A. (1996). A test of a multidimensional model of job insecurity: The case of Israeli teachers. Journal of Organizational Behavior, 17(S1), 587-605.
Salem, H. (2003). Organizational performance management and measurement. United Nations Economic and Social Council.  
Santoso, L. W., Sinawan, A. A., Wijaya, A. R., Sudiarso, A., Masruroh, N. A., & Herliansyah, M. K. (2017, November). Operating room scheduling using hybrid clustering priority rule and genetic algorithm. In   Sutopo, W., Jauhari, W. A., Nor, F. M., & Kurniawan, D. (Eds.), AIP conference proceedings (Vol. 1902, No. 1, pp. 20-32).    Miri, Malaysia: AIP Publishing LLC.
Saradhi, V. V., & Palshikar, G. K. (2011). Employee churn prediction. Expert Systems with Applications, 38(3), 1999-2006.
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526-534.
Sexton, R. S., McMurtrey, S., Michalopoulos, J. O., & Smith, A. M. (2005). Employee turnover: A neural network solution. Computers & Operations Research, 32(10), 2635-2651.
Shaker Ardakani, M., Abzari, M., Shaemi, A., & Fathi, S. (2016). Diversity management and human resources productivity: Mediating effects of perceived organizational attractiveness, organizational justice and social identity in Isfahan’s steel industry. Iranian Journal of Management Studies, 9(2), 407-432.
Silbert, L. (2005). The effect of tangible rewards on perceived organizational support (unpublished Master’s thesis). University of Waterloo.
Simić, D., Ilin, V., Svirčević, V., & Simić, S. (2017). A hybrid clustering and ranking method for best positioned logistics distribution centre in Balkan Peninsula. Logic Journal of the IGPL, 25(6), 991-1005.
Simić, S., Banković, Z., Simić, D., & Simić, S. D. (2018, June). A hybrid clustering approach for diagnosing medical diseases. In de Cos Juez F. et al.   (Eds.), International conference on hybrid artificial intelligence systems (pp. 741-752). Cham: Springer.
Singh, D. (2019). A literature review on employee retention with focus on recent trends. International Journal of Scientific Research in Science and Technology, 6(1), 425-431.
Storey, D. J. (2016). Understanding the small business sector. Routledge.
Tajeddini, K. (2015). Exploring the antecedents of effectiveness and efficiency. International Journal of Hospitality Management, 49, 125-135.
Tanaka, K., Hayamizu, S., & Ohta, K. (1986, April). A demiphoneme network representation of speech and automatic labeling techniques for speech data base construction. In FUJISAKI, H. et al.(Eds.),   ICASSP’86. IEEE international conference on acoustics, speech, and signal processing (Vol. 11, pp. 309-312).    Tokyo, Japan: IEEE.
Tchuenté, A. T. K., De Jong, S. M., Roujean, J. L., Favier, C., & Mering, C. (2011). Ecosystem mapping at the African continent scale using a hybrid clustering approach based on 1-km resolution multi-annual data from SPOT/VEGETATION. Remote Sensing of Environment, 115(2), 452-464.
Tosida, E. T., Andria, F., Wahyudin, I., Widianto, R., Ganda, M., & Lathif, R. R. (2019, July). A hybrid data mining model for Indonesian telematics SMEs empowerment. In Rindengan, A. J., Montolalu, C. E. J. C., Nainggolan, N., Tumilaar, R., & Bakhtiar, T. (Eds.),   IOP conference series: Materials science and engineering (vol. 567, No. 1, pp. 1-12).    Manado, Indonesia: IOP Publishing.
Treeratpituk, P., & Callan, J. (2006, May). Automatically labeling hierarchical clusters. In  Fortes, J. A. B., & Macintosh, A. (Eds.), Proceedings of the 2006 international conference on digital government research (pp. 167-176).    San Diego, California, USA: Digital Government Society of North America.
Vasantham, T., & Swarnalatha, C. (2015). Importance of employee retention. International Journal of Research in Finance and Marketing, 5(8), 7-9.
Walker, J. (2001). Perspectives of human resource planning. Journal of Management, 24(1), 6-10.
Wong, C. C., & Chen, C. C. (1999). A hybrid clustering and gradient descent approach for fuzzy modeling. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 29(6), 686-693.
Wong, M. A. (1982). A hybrid clustering method for identifying high-density clusters. Journal of the American Statistical Association, 77(4  ), 841-847.
Yankov, D., & Keogh, E. (2006, December). Manifold clustering of shapes. In Clifton, C. W., Zhong, N., Liu, J., Wah, B. W., & Wu, X. (Eds.), Sixth international conference on data mining (ICDM’06) (pp. 1167-1171).    Hong Kong, China: IEEE.
Zhang, Z., Wang, S. Z., & Zhang, Y. (2007). New clustering method based on hybrid of SOM and PAM. Jisuanji Yingyong/ Journal of Computer Applications, 27(6), 1400-1402.
Zhen, J., Blagojevic, R., & Plimmer, B. (2012, June). Automated labeling of ink stroke data. In Kara, L. B., & Singh, K. (Eds.), Proceedings of the international symposium on sketch-based interfaces and modeling (pp. 67-75).   Goslar, Germany: The Eurographics Association.
Zineldin, M. (2000). TRM: total relationship management. Studentlitteratur.