Cancer detection from textual data using a combination of machine learning approach

Document Type : SI: DBBD-2023

Authors

1 Department of Information Technology Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

2 Department of Industrial Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

3 Department of Information Technology Management, Tehran North Branch, Islamic Azad University, Tehran, Iran

4 Department of Industrial Engineering, Faculty of Industrial and Mechanical Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran

5 Department of Industrial Management, Tehran South Branch, Islamic Azad University, Tehran, Iran

10.22059/ijms.2023.362252.676037

Abstract

Recently, cancer has become one of the main diseases and causes of death of people all over the world. For this purpose, extensive research has been done on the prediction and early detection of this disease in the body of patients in different fields. Artificial intelligence and data mining approaches are among the methods that have helped researchers in diagnosing this disease. In this research, a machine learning approach for early and timely diagnosis of cancer disease is presented. For this purpose, it uses logistic regression techniques, Naive Bayes, two versions of Random Forest and Support Vector Machine, which work in parallel with each other. As a result of the integration of the techniques, the proposed system achieves higher accuracy and reduces errors compared to the basic methods. The performance of the proposed method was evaluated using different criteria and showed superior results compared to traditional methods.

Keywords

Main Subjects


Aggarwal, C. C. & Zhai, C. (2012). Mining text data. Springer Science & Business Media. ISBN: 978-1-4614-3222-7 (Print) 978-1-4614-3223-4. (Online)
Ahmad, Iftikhar, Muhammad Yousaf, Suhail Yousaf, and Muhammad Ovais Ahmad. "Fake news detection using machine learning ensemble methods." Complexity 2020 (2020).
Aldhaeebi, Maged A., Khawla Alzoubi, Thamer S. Almoneef, Saeed M. Bamatraf, Hussein Attia, and Omar M. Ramahi. "Review of microwaves techniques for breast cancer detection." Sensors 20, no. 8 (2020): 2390.
Asgharizadeh, E., Kadivar, M., Noroozi, M., Mottaghi, V., Mohammadi, H., & Chobar, A. P. (2022). The intelligent traffic management system for emergency medical service station location and allocation of ambulances. Computational intelligence and neuroscience, 2022.
Bhatia, Siddharth, Yash Sinha, and Lavika Goel. "Lung cancer detection: a deep learning approach." In Soft Computing for Problem Solving, pp. 699-705. Springer, Singapore, 2019.
Botlagunta, Mahendran, et al. "Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms." Scientific Reports 13.1 (2023): 485.
Braiman L, Random forests, Machine Learn 2001; 45: p. 5-32.
Chandra, Tej Bahadur, Kesari Verma, Bikesh Kumar Singh, Deepak Jain, and Satyabhuwan Singh Netam. "Coronavirus disease (COVID-19) detection in chest X-ray images using majority voting based classifier ensemble." Expert systems with applications 165 (2021): 113909.
Chobar, A. P., Adibi, M. A., & Kazemi, A. (2022). Multi-objective hub-spoke network design of perishable tourism products using combination machine learning and meta-heuristic algorithms. Environment, Development and Sustainability, 1-28.
Conceição, Sofia IR, and Francisco M. Couto. "Text Mining for Building Biomedical Networks Using Cancer as a Case Study." Biomolecules 11, no. 10 (2021): 1430.
Dildar, Mehwish, Shumaila Akram, Muhammad Irfan, Hikmat Ullah Khan, Muhammad Ramzan, Abdur Rehman Mahmood, Soliman Ayed Alsaiari, Abdul Hakeem M. Saeed, Mohammed Olaythah Alraddadi, and Mater Hussen Mahnashi. "Skin cancer detection: a review using deep learning techniques." International journal of environmental research and public health 18, no. 10 (2021): 5479.
Erdem, Ebru, and Ferhat Bozkurt. "A comparison of various supervised machine learning techniques for prostate cancer prediction." Avrupa Bilim ve Teknoloji Dergisi 21 (2021): 610-620.
Garg, Rishu, Saumil Maheshwari, and Anupam Shukla. "Decision support system for detection and classification of skin cancer using CNN." In Innovations in Computational Intelligence and Computer Vision, pp. 578-586. Springer, Singapore, 2021.
Harkema, Henk, Wendy W. Chapman, Melissa Saul, Evan S. Dellon, Robert E. Schoen, and Ateev Mehrotra. "Developing a natural language processing application for measuring the quality of colonoscopy procedures." Journal of the American Medical Informatics Association 18, no. Supplement_1 (2011): i150-i156.
Hekler, Achim, Jochen S. Utikal, Alexander H. Enk, Axel Hauschild, Michael Weichenthal, Roman C. Maron, Carola Berking et al. "Superior skin cancer classification by the combination of human and artificial intelligence." European Journal of Cancer 120 (2019): 114-121.
Hjaltelin, Jessica Xin, et al. "Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records." medRxiv (2023): 2023-02.
Hosseini, S., Ahmadi Choukolaei, H., Ghasemi, P., Dardaei-beiragh, H., Sherafatianfini, S., & Pourghader Chobar, A. (2022). Evaluating the performance of emergency centers during coronavirus epidemic using multi-criteria decision-making methods (case study: sari city). Discrete Dynamics in Nature and Society, 2022.
Hu, Z., Tang, J., Wang, Z., Zhang, K., Zhang, L., & Sun, Q. (2018). Deep learning for image-based cancer detection and diagnosis− A survey. Pattern Recognition83, 134-149.
Hua, K.L., Hsu, C.H., Hidayati, S.C., Cheng, W.H. and Chen, Y.J., 2015. Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets and therapy, 8.
Iqbal, Saqib, Ghazanfar Farooq Siddiqui, Amjad Rehman, Lal Hussain, Tanzila Saba, Usman Tariq, and Adeel Ahmed Abbasi. "Prostate cancer detection using deep learning and traditional techniques." IEEE Access 9 (2021): 27085-27100.
Islam, M. J., Wu, Q. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007, November). Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers. In 2007 international conference on convergence information technology (ICCIT 2007) (pp. 1541-1546). IEEE.
Jahangiri, S., Abolghasemian, M., Ghasemi, P., & Chobar, A. P. (2023). Simulation-based optimisation: analysis of the emergency department resources under COVID-19 conditions. International journal of industrial and systems engineering, 43(1), 1-19.
Jahangiri, S., Abolghasemian, M., Pourghader Chobar, A., Nadaffard, A., & Mottaghi, V. (2021). Ranking of key resources in the humanitarian supply chain in the emergency department of iranian hospital: a real case study in COVID-19 conditions. Journal of applied research on industrial engineering, 8(Special Issue), 1-10.
Kaur, Ramandeep, and Navdeep Kaur. "Improved Skin Cancer Detection Classification Residual Network Feature Engineering." In 2021 International Conference on Computational Performance Evaluation (ComPE), pp. 671-675. IEEE, 2021.
Khorshid, Shler Farhad, Adnan Mohsin Abdulazeez, and Amira Bibo Sallow. "A comparative analysis and predicting for breast cancer detection based on data mining models." Asian Journal of Research in Computer Science (2021): 45-59.
L.A. Torre, F. Bray, R.L. Siegel, J. Ferlay, J. Lortet–Tieulent, A. Jemal, Global cancer statistics, 2012, CA, Cancer J. Clin. 65 (2015) 87–108.
Liew, Xin Yu, Nazia Hameed, and Jeremie Clos. "An investigation of XGBoost-based algorithm for breast cancer classification." Machine Learning with Applications 6 (2021): 100154.
Lisboa, Paulo JG, Alfredo Vellido, Roberto Tagliaferri, Francesco Napolitano, Michele Ceccarelli, José D. Martín-Guerrero, and Elia Biganzoli. "Data mining in cancer research [application notes]." IEEE computational intelligence magazine 5, no. 1 (2010): 14-18.
Mahesh, T. R., V. Vinoth Kumar, V. Muthukumaran, H. K. Shashikala, B. Swapna, and Suresh Guluwadi. "Performance Analysis of XGBoost Ensemble Methods for Survivability with the Classification of Breast Cancer." Journal of Sensors (2022).
Mojrian, Sanaz, Gergo Pinter, Javad Hassannataj Joloudari, Imre Felde, Akos Szabo-Gali, Laszlo Nadai, and Amir Mosavi. "Hybrid machine learning model of extreme learning machine radial basis function for breast cancer detection and diagnosis; a multilayer fuzzy expert system." In 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), pp. 1-7. IEEE, 2020.
Mushtaq, Zohaib, Akbari Yaqub, Shaima Sani, and Adnan Khalid. "Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets." Journal of the Chinese Institute of Engineers 43, no. 1 (2020): 80-92.
Nanglia, Pankaj, Sumit Kumar, Aparna N. Mahajan, Paramjit Singh, and Davinder Rathee. "A hybrid algorithm for lung cancer classification using SVM and Neural Networks." ICT Express 7, no. 3 (2021): 335-341.
Patel, Falguni N., Hitesh B. Shah, and Shishir Shah. "A Technique to Find Out Low Frequency Rare Words in Medical Cancer Text Document Classification." In Advances in Data Computing, Communication and Security: Proceedings of I3CS2021, pp. 121-132. Singapore: Springer Nature Singapore, 2022.
Pradhan, A. (2012). Support vector machine-a survey. International Journal of Emerging Technology and Advanced Engineering, 2(8), 82-85.
Ramasubramanian, C., and R. Ramya. "Effective pre-processing activities in text mining using improved porter’s stemming algorithm." International Journal of Advanced Research in Computer and Communication Engineering 2.12 (2013): 4536-4538.
Riquelme, Diego, and Moulay A. Akhloufi. "Deep learning for lung cancer nodules detection and classification in CT scans." Ai 1, no. 1 (2020): 28-67.
Saadi, Mesgari, and Ranjbar Abolfazl. "Analysis and estimation of deforestation using satellite imagery and GIS." GIS Application in Environment, GISDevelopment. net (2000).
Soffer, Shelly, Eyal Klang, Noam Tau, Roni Zemet, Shomron Ben-Horin, Yiftach Barash, and Uri Kopylov. "Evolution of colorectal cancer screening research in the past 25 years: text-mining analysis of publication trends and topics." Therapeutic Advances in Gastroenterology 13 (2020): 1756284820941153.
Upadhyay, Darshana, Jaume Manero, Marzia Zaman, and Srinivas Sampalli. "Intrusion detection in SCADA based power grids: Recursive feature elimination model with majority vote ensemble algorithm." IEEE Transactions on Network Science and Engineering 8, no. 3 (2021): 2559-2574.
Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of predictive text mining: Springer Science & Business Media.
Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). Sparktext: Biomedical text mining on big data framework. PloS one, 11(9), e0162721.
Zhu, X., Yao, J. and Huang, J., 2016, December. Deep convolutional neural network for survival analysis with pathological images. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 544-547). IEEE.