Volume 16, Issue 4 (December 2024 2024)                   Iranian Journal of Blood and Cancer 2024, 16(4): 20-29 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Krotha D P, Shaik F. Predictive Modeling and Spatial Analysis of Cervix Uteri and Breast Cancer in India using Machine Learning and Big Data Frameworks. Iranian Journal of Blood and Cancer 2024; 16 (4) :20-29
URL: http://ijbc.ir/article-1-1625-en.html
1- Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India. , durgapujitha135@gmail.com
2- Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India.
Abstract:   (269 Views)
Background: Cancer remains a critical public health issue in India, with rising cases of breast cancer and cervical cancer. Accurate predictions and spatial analysis of cancer incidence are essential for shaping prevention strategies and targeting interventions in high-risk regions.
Methods: This study utilized a big data framework employing machine learning techniques from the SparkML library to predict cancer cases and analyze spatial distributions across Indian states from 2016 to 2021. Three machine learning models used Random Forest Regressor, Gradient Boosting Regressor, and Geographically Weighted Regression (GWR) were applied to the dataset. Spatial autocorrelation analysis used Moran’s I statistic to identify clustering patterns.
Results: The spatial analysis revealed significant clustering of cancer cases, particularly in 2020, with a z-score of 2.23, a p-value of 0.02, and a Moran’s index of 0.15. Among the machine learning models, GWR achieved a predictive accuracy of 98% for both breast cancer and cervical cancer, while the Random Forest Regressor and Gradient Boosting Regressor achieved 95% and 97% accuracy, respectively, over the six-year period. Gradient Boosting outperformed other models in identifying key predictors and ensuring high predictive accuracy.
Conclusions: The findings highlight the efficacy of Gradient Boosting and GWR in predicting cancer incidence and analyzing spatial patterns. These models provide critical insights into cancer clustering and risk factors, supporting the development of targeted prevention strategies and policy interventions for high-risk regions in India. The results emphasize the utility of machine learning techniques in public health research and cancer control.
Full-Text [PDF 750 kb]   (216 Downloads)    
: Original Article | Subject: Methodology
Received: 2024/11/11 | Accepted: 2024/12/25 | Published: 2024/12/30

References
1. Senthilkumar SA, R.B., Meshram AA, Gunasekaran A, Chandrakumarmangalam S Big data in healthcare management: a review of literature. American Journal of Theoretical and Applied Business, 2018 4(2): p. 57-69. [DOI:10.11648/j.ajtab.20180402.14]
2. Dash S, S.S., Sharma M, Kaushik S, Big data in healthcare: management, analysis and future prospects. Journal of big data, 2019. 6(1): p. 1-25. [DOI:10.1186/s40537-019-0217-0]
3. Haider MS, S.S., Hassan S, Taniwall NJ, Moazzam MF, Lee BG, Spatial distribution and mapping of COVID-19 pandemic in Afghanistan using GIS technique. SN Social Sciences, 2022. 2(5): p. 59. [DOI:10.1007/s43545-022-00349-0]
4. Shailaja K, S.B., Jabbar MA, Prediction of breast cancer using big data analytics. Int J Eng Technol, 2018. 7(46): p. 223. [DOI:10.14419/ijet.v7i4.6.20480]
5. Daghistani T, A.H., Alshammari R, AlHazme RH, Predictors of outpatients' no-show: big data analytics using apache spark. Journal of Big Data, 2020. 7: p. 1-5. [DOI:10.1186/s40537-020-00384-9]
6. Bhatla N, A.D., Sharma DN, Sankaranarayanan R, Cancer of the cervix uteri: 2021 update. International Journal of Gynecology & Obstetrics, 2021. 155: p. 28-44. [DOI:10.1002/ijgo.13865]
7. Asri H, M.H., Al Moatassime H, Noel T, Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Computer Science, 2016. 83: p. 1064-9. [DOI:10.1016/j.procs.2016.04.224]
8. Kebede Kassaw AA, M.Y.T., Sebastian Y, Yeneneh Birhanu A, Sharew Melaku M, Surur Jemal S, Spatial distribution and machine learning prediction of sexually transmitted infections and associated factors among sexually active men and women in Ethiopia, evidence from EDHS 2016. BMC Infectious Diseases, 2023. 23(1): p. 49. [DOI:10.1186/s12879-023-07987-6]
9. Batko K, Ś.A., The use of Big Data Analytics in healthcare. Journal of big Data, 2022 9(1): p. 3. [DOI:10.1186/s40537-021-00553-4]
10. Ozyilmaz A, B.Y., Toprak M, Isik E, Guloglu T, Aydin S, Olgun MF, Younis M, Socio-economic, demographic and health determinants of the COVID-19 outbreak. Healthcare, 2022. 10(4): p. 748. [DOI:10.3390/healthcare10040748]
11. Jenila VM, V.P., Rajasekar SJ Geospatial mapping, Epidemiological modelling, Statistical correlation and analysis of COVID-19 with Forest cover and Population in the districts of Tamil Nadu, India, in 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMRI) 2020, IEEE: Buldhana, India. p. 1-7. [DOI:10.1109/ICATMRI51801.2020.9398398]
12. Raymundo CE, O.M., Eleuterio TD, André SR, da Silva MG, Queiroz ER, Medronho RD Spatial analysis of COVID-19 incidence and the sociodemographic context in Brazil. Plos one, 2021. 16(3): p. e0247794. [DOI:10.1371/journal.pone.0247794]
13. P, G., Spatiotemporal Analysis of COVID-19 Pandemic and Predictive Models based on Artificial Intelligence for different States of India. Journal of The Institution of Engineers (India): Series B, 2021. 102(6): p. 1265-74. [DOI:10.1007/s40031-021-00617-2]
14. Mccarthy JF, M.K., Hoffman PE, Gee AG, O'neil P, Ujwal ML, Hotchkiss J Applications of machine learning and high‐dimensional visualization in cancer detection, diagnosis, and management. Annals of the New York Academy of Sciences, 2004. 1020(1): p. 239-62. [DOI:10.1196/annals.1310.020]
15. Colozza M, C.F., Sotiriou C, Larsimont D, Piccart MJ Bringing molecular prognosis and prediction to the clinic. Clinical breast cancer, 2005. 6(1): p. 61-76. [DOI:10.3816/CBC.2005.n.010]
16. Burke HB, B.D., Meiers I, Montironi R Prostate cancer outcome: epidemiology and biostatistics. 2005, Analytical and quantitative cytology and histology: https://europepmc.org/article/med/16220832. p. 211-7.
17. Cochran, A., et al., Prediction of outcome for patients with cutaneous melanoma. Current Diagnostic Pathology, 2003. 9(5): p. 302-312. [DOI:10.1016/S0968-6053(03)00051-6]
18. Fielding LP, F.P.C., Freedman LS, The future of prognostic factors in outcome prediction for patients with cancer. Cancer, 1992. 70(9): p. 2367-77. https://doi.org/10.1002/1097-0142(19921101)70:9<2367::AID-CNCR2820700927>3.0.CO;2-B [DOI:10.1002/1097-0142(19921101)70:93.0.CO;2-B]
19. Leenhouts, H., Radon-induced lung cancer in smokers and non-smokers: risk implications using a two-mutation carcinogenesis model. Radiation and environmental biophysics, 1999. 38(1): p. 57-71. [DOI:10.1007/s004110050138]
20. Bach, P.B., et al., Variations in lung cancer risk among smokers. Journal of the National Cancer Institute, 2003. 95(6): p. 470-478. [DOI:10.1093/jnci/95.6.470]
21. Gasco F, V.M., Martos R, Zafra M, Morales R, Castano MA, Childhood obesity and hormonal abnormalities associated with cancer risk. European journal of cancer prevention, 2004. 13(3): p. 193-7. [DOI:10.1097/01.cej.0000130021.16182.c3]
22. Domchek SM, E.A., Calzone K, Stopfer J, Blackwood A, Weber BL, Application of breast cancer risk prediction models in clinical practice. Journal of Clinical Oncology, 2003. 21(4): p. 593-601. [DOI:10.1200/JCO.2003.07.007]
23. Colozza, M., et al., Bringing molecular prognosis and prediction to the clinic. Clinical breast cancer, 2005. 6(1): p. 61-76. [DOI:10.3816/CBC.2005.n.010]
24. Dai B, C.R., Zhu SZ, Zhang WW, Using random forest algorithm for breast cancer diagnosis, in 2018 International symposium on computer, consumer and control (IS3C). 2018, IEEE: Taichung, Taiwan. p. 449-452. [DOI:10.1109/IS3C.2018.00119]
25. Al Mudawi N, A.A., A model for predicting cervical cancer using machine learning algorithms. Sensors, 2022. 22(11): p. 4132. [DOI:10.3390/s22114132]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2025 All Rights Reserved | Iranian Journal of Blood and Cancer

Designed & Developed by : Yektaweb