Time series for predicting infectious disease outbreaks in Latin America

Authors

DOI:

https://doi.org/10.70577/9j5qky84

Keywords:

diseases, epidemiology, models, prediction, time series.

Abstract

This study analyzes the predictive effectiveness of time series models applied to infectious disease outbreaks in Latin America, using a data science approach. Two approaches were compared: the seasonal SARIMA model and a hybrid SARIMA + NNAAR (Autoregressive Neural Network) model. The results show that, although SARIMA presents limited explanatory power (negative R²), it maintains acceptable performance in terms of error (RMSE=1.55; MAE=0.87). In contrast, the hybrid model showed inferior performance, with higher errors and an even more negative R², indicating that the incorporation of a neural network does not necessarily improve the system's predictive capacity. The learning curve of the NNAAR model suggests possible undertraining, reinforcing the need for careful calibration when integrating complex models. The study highlights the importance of selecting models based on data structure, beyond technical sophistication, and recommends methodological optimizations before implementing hybrid models in epidemiological surveillance systems. This analysis, based on realistic simulated data, underscores the value of time series methodologies for disease prediction and public health decision-making.

 

References

[1] Satrio, C. B. A., Darmawan, W., Nadia, B. U., & Hanafiah, N. (2021). Time series analysis and forecasting of coronavirus disease in Indonesia using ARIMA model and PROPHET. Procedia Computer Science, 179, 524-532. https://doi.org/10.1016/j.procs.2021.01.036

[2] Xiao, H., Dai, X., Wagenaar, B. H., Liu, F., Augusto, O., Guo, Y., & Unger, J. M. (2021). The impact of the COVID-19 pandemic on health services utilization in China: Time-series analyses for 2016–2020. The Lancet Regional Health–Western Pacific, 9. https://doi.org/10.1016/j.lanwpc.2021.100122

[3] Furtado, P. (2021). Epidemiology SIR with regression, arima, and Prophet in forecasting COVID-19. Engineering Proceedings, 5(1), 52. https://doi.org/10.3390/engproc2021005052

[4] Fan, J., Zhang, K., Huang, Y., Zhu, Y., & Chen, B. (2023). Parallel spatio-temporal attention-based TCN for multivariate time series prediction. Neural Computing and Applications, 35(18), 13109-13118. https://link.springer.com/article/10.1007/s00521-021-05958-z

[5] Katris, C. (2021). A time series-based statistical approach for outbreak spread forecasting: Application of COVID-19 in Greece. Expert systems with applications, 166, 114077. https://doi.org/10.1016/j.eswa.2020.114077

[6] Cihan, P. (2021). Forecasting fully vaccinated people against COVID-19 and examining future vaccination rate for herd immunity in the US, Asia, Europe, Africa, South America, and the World. Applied soft computing, 111, 107708.

https://doi.org/10.1016/j.asoc.2021.107708

[7] Nikparvar, B., Rahman, M. M., Hatami, F., & Thill, J. C. (2021). Spatio-temporal prediction of the COVID-19 pandemic in US counties: modeling with a deep LSTM neural network. Scientific reports, 11(1), 21715. https://www.nature.com/articles/s41598-021-01119-3

[8] Santangelo, O. E., Gentile, V., Pizzo, S., Giordano, D., & Cedrone, F. (2023). Machine learning and prediction of infectious diseases: a systematic review. Machine Learning and Knowledge Extraction, 5(1), 175-198. https://doi.org/10.3390/make5010013

[9] Akindahunsi, T., Olulaja, O., Ajayi, O., Prisca, I., Onyenegecha, U. H., & Fadojutimi, B. (2024). Analytical tools in diseases epidemiology and surveillance: A review of literature. International Journal of Applied Research, 10(9), 155-161. http://dx.doi.org/10.22271/allresearch.2024.v10.i9c.12018

[10] Kuo, R. J., & Xu, Z. X. (2024). Predictive maintenance for wire drawing machine using MiniRocket and GA-based ensemble method. The International Journal of Advanced Manufacturing Technology, 134(3), 1661-1676. http://dx.doi.org/10.1007/s00170-024-14225-z

[11] MatgSimpson, R. B., Kulinkina, A. V., & Naumova, E. N. (2022). Investigating seasonal patterns in enteric infections: a systematic review of time series methods. Epidemiology & Infection, 150, e50. https://doi.org/10.1017/s0950268822000243

[12] Mathur, M. B., & Fox, M. P. (2023). Toward open and reproducible epidemiology. American Journal of Epidemiology, 192(4), 658-664. https://doi.org/10.1093/aje/kwad007

[13] Riaz, M., Hussain Sial, M., Sharif, S., & Mehmood, Q. (2023). Epidemiological forecasting models using ARIMA, SARIMA, and holt–winter multiplicative approach for Pakistan. Journal of Environmental and Public Health, 2023(1), 8907610. http://dx.doi.org/10.1155/2023/8907610

[14] Wang, M., Pan, J., Li, X., Li, M., Liu, Z., Zhao, Q., ... & Wang, Y. (2022). ARIMA and ARIMA-ERNN models for prediction of pertussis incidence in mainland China from 2004 to 2021. BMC Public Health, 22(1), 1447. https://doi.org/10.1186/s12889-022-13872-9

[15] akermi, J., Xiao, Y., Sheng, Q., Zhou, J., Zhang, Z., & Zhu, F. (2024). Epidemiology and SARIMA model of deaths in a tertiary comprehensive hospital in Hangzhou from 2015 to 2022. BMC Public Health, 24(1), 2549. http://dx.doi.org/10.1186/s12889-024-20033-7

[16] Wu, Y., Li, S., & Guo, Y. (2021). Space-time-stratified case-crossover design in environmental epidemiology study. Health Data Science, 2021, 9870798. http://dx.doi.org/10.34133/2021/9870798

[17] OsaaXing, L., Zhang, X., Burstyn, I., & Gustafson, P. (2021). On logistic Box–Cox regression for flexibly estimating the shape and strength of exposure‐disease relationships. Canadian Journal of Statistics, 49(3), 808-825. https://doi.org/10.1002/cjs.11587

[18] Osama, O. M., Alakkari, K., Abotaleb, M., & El-Kenawy, E. S. M. (2023). Forecasting global monkeypox infections using LSTM: a non-stationary time series analysis. In 2023 3rd international conference on electronic engineering (ICEEM) (pp. 1-7). IEEE. http://dx.doi.org/10.1109/ICEEM58740.2023.10319532

[19] Alassafi, M. O., Jarrah, M., & Alotaibi, R. (2022). Time series predicting of COVID-19 based on deep learning. Neurocomputing, 468, 335-344. https://doi.org/10.1016/j.neucom.2021.10.035

[20] Gudziunaite, S., Shabani, Z., Weitensfelder, L., & Moshammer, H. (2023). Time series analysis in environmental epidemiology: challenges and considerations. International Journal of Occupational Medicine and Environmental Health, 36(6), 704. https://doi.org/10.13075/ijomeh.1896.02237

[21] Musa, S. S., Qureshi, S., Zhao, S., Yusuf, A., Mustapha, U. T., & He, D. (2021). Mathematical modeling of COVID-19 epidemic with effect of awareness programs. Infectious disease modelling, 6, 448-460. https://doi.org/10.1016/j.idm.2021.01.012

[22] Cori, A., & Kucharski, A. (2024). Inference of epidemic dynamics in the COVID-19 era and beyond. Epidemics, 100784. http://dx.doi.org/10.1016/j.asoc.2021.107708

[23] Ayoobi, N., Sharifrazi, D., Alizadehsani, R., Shoeibi, A., Gorriz, J. M., Moosaei, H., ... & Mosavi, A. (2021). Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods. Results in physics, 27, 104495. https://doi.org/10.1016/j.rinp.2021.104495

[24] Shaikh, S., Gala, J., Jain, A., Advani, S., Jaidhara, S., & Edinburgh, M. R. (2021). Analysis and prediction of covid-19 using regression models and time series forecasting. In 2021 11th international conference on cloud computing, data science & engineering (Confluence) (pp. 989-995). IEEE. http://dx.doi.org/10.1109/Confluence51648.2021.9377065

[25] Dorward, J., Khubone, T., Gate, K., Ngobese, H., Sookrajh, Y., Mkhize, S., ... & Garrett, N. (2021). The impact of the COVID-19 lockdown on HIV care in 65 South African primary care clinics: an interrupted time series analysis. The lancet HIV, 8(3), e158-e165. https://doi.org/10.1016/s2352-3018(20)30359-3

[26] Chen, Y., Li, N., Lourenço, J., Wang, L., Cazelles, B., Dong, L., ... & Tully, D. C. (2022). Measuring the effects of COVID-19-related disruption on dengue transmission in southeast Asia and Latin America: a statistical modelling study. The Lancet infectious diseases, 22(5), 657-667. https://doi.org/10.1016/s1473-3099(22)00025-1

[27] Chen, M., Zhu, H., Chen, Y., & Wang, Y. (2022). A novel missing data imputation approach for time series air quality data based on logistic regression. Atmosphere, 13(7), 1044. https://doi.org/10.3390/atmos13071044

[28] Meritxell, G. O., Sierra, B., & Ferreiro, S. (2022). On the evaluation, management and improvement of data quality in streaming time series. IEEE Access, 10, 81458-81475. http://dx.doi.org/10.1109/ACCESS.2022.3195338

[29] Yarmol-Matusiak, E. A., Cipriano, L. E., & Stranges, S. (2021). A comparison of COVID-19 epidemiological indicators in Sweden, Norway, Denmark, and Finland. Scandinavian journal of public health, 49(1), 69-78. https://doi.org/10.1177/1403494820980264

[30] Liu, S., & Zhou, D. J. (2024). Using cross‐validation methods to select time series models: Promises and pitfalls. British Journal of Mathematical and Statistical Psychology, 77(2), 337-355. http://dx.doi.org/10.1111/bmsp.12330

[31] Bommareddy, S., Khan, J. A., & Anand, R. (2022). A review on healthcare data privacy and security. Networking Technologies in Smart Healthcare, 165-187. http://dx.doi.org/10.1201/9781003239888-8

[32] Cai, J., Liu, G., Jia, H., Zhang, B., Wu, R., Fu, Y., ... & Zhang, R. (2022). A new algorithm for landslide dynamic monitoring with high temporal resolution by Kalman filter integration of multiplatform time-series InSAR processing. International Journal of Applied Earth Observation and Geoinformation, 110, 102812. https://doi.org/10.1016/j.jag.2022.102812

[33] Akermi, S. E., L’Hadj, M., & Selmane, S. (2021). Epidemiology and time series analysis of human brucellosis in Tebessa province, Algeria, from 2000 to 2020. Journal of Research in Health Sciences, 22(1), e00544. https://doi.org/10.34172/jrhs.2022.79

[34] Wu, W. W., Li, Q., Tian, D. C., Zhao, H., Xia, Y., Xiong, Y., ... & Qi, L. (2022). Forecasting the monthly incidence of scarlet fever in Chongqing, China using the SARIMA model. Epidemiology & Infection, 150, e90. https://doi.org/10.1017/s0950268822000693

[35] Mamudu, L., Yahaya, A., & Dan, S. (2021). Application of seasonal autoregressive integrated moving average (SARIMA) for flows of river kaduna. Niger. J. Eng, 28(2). https://www.researchgate.net/publication/354778234_Application_of_Seasonal_Autoregressive_Integrated_Moving_Average_SARIMA_For_Flows_of_River_Kaduna

[36] Singh, D. (2024). Deployment of Seasonal Autoregressive Integrated Moving Average (SARIMA) Models for Network Reliability Prediction. In 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1-6). IEEE. http://dx.doi.org/10.1063/5.0223836

[37] Liu, Z., Wan, G., Prakash, B. A., Lau, M. S., & Jin, W. (2024). A review of graph neural networks in epidemic modeling. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 6577-6587). http://dx.doi.org/10.1145/3637528.3671455

[38] Serghiou, S., & Rough, K. (2023). Deep learning for epidemiologists: an introduction to neural networks. American journal of epidemiology, 192(11), 1904-1916. http://dx.doi.org/10.48550/arXiv.2202.01319

[39] Man, H., Huang, H., Qin, Z., & Li, Z. (2023). Analysis of a SARIMA-XGBoost model for hand, foot, and mouth disease in Xinjiang, China. Epidemiology & Infection, 151, e200. https://doi.org/10.1017/s0950268823001905

[40] Anteneh, L. M., Lokonon, B. E., & Kakaï, R. G. (2024). Modelling techniques in cholera epidemiology: A systematic and critical review. Mathematical Biosciences, 109210. https://doi.org/10.1016/j.mbs.2024.109210

[41] Hamilton, A. J., Strauss, A. T., Martinez, D. A., Hinson, J. S., Levin, S., Lin, G., & Klein, E. Y. (2021). Machine learning and artificial intelligence: applications in healthcare epidemiology. Antimicrobial Stewardship & Healthcare Epidemiology, 1(1), e28. https://doi.org/10.1017/ash.2021.192

Contribución de los Autores Individuales en la Elaboración de un Artículo Científico (Política de Ghostwriting)

Todos los autores participaron equitativamente del desarrollo del artículo.

Fuentes de Financiamiento para la Investiga-ción Presentada en el Artículo Científico o para el Artículo Científico en sí

No se recibió financiación para la realización de este estudio.

Conflicto de Intereses

Los autores declaran no tener ningún conflicto de interés relevante con el contenido de este artículo.

Licencia de Atribución de Creative Commons 4.0 (Atribución 4.0 Internacional, CC BY 4.0)

Este artículo se publica bajo los términos de la Licencia de Atribución de Creative Commons 4.0.

https://creativecommons.org/licenses/by/4.0/deed.es

Published

2024-03-15

How to Cite

Time series for predicting infectious disease outbreaks in Latin America. (2024). Innovación Integral, 1(1), 1-14. https://doi.org/10.70577/9j5qky84