Methods/Statistics
Comparison of machine learning, Bayesian spatiotemporal model, and substitution approaches to estimate cause-specific mortality rates from sparse data: A simulation study Ikhan Kim* Ikhan Kim Hyeona
In this study, we compared machine learning, the Bayesian spatiotemporal model, and the substitution approaches when calculating the suicide mortality rate according to education level in 237 districts in Korea. We acquired the population and number of suicide deaths by year (2005, 2010, 2015), district, education level (middle school or lower, high school, college or higher), and 10-year age group (30-39, …,80 or older). We set up a hypothetical situation assuming that the suicide death rate by age group in all districts in Korea was the same by year and education level and calculated the age-standardized suicide mortality rates as the reference value. 1,000 samples were randomly drawn, assuming a Poisson distribution of deaths by year, district, education level, and age group. One thousand age-standardized suicide mortality rates for years, districts, and education levels were estimated and compared with the reference value. We used five different approaches. (1) Use the observed number of deaths as is, (2) substitute the death number using the suicide death rate according to year-education level-degree of urbanization, or (3) substitute the number of deaths with 0.1 when the number of suicide deaths is 0. We also used (4) Bayesian spatiotemporal models and (5) machine learning methods (generalized linear models via penalized maximum likelihood likelihood). Root mean squared error (RMSE), mean absolute error (MAE), and mean error (ME) were used in comparison. The suicide mortality rate in low population sizes, such as middle school or lower and rural areas, differed significantly from the reference value. The substitution approach estimated the suicide mortality rate of the high-education group higher than the non-substitution approach. However, when using the Bayesian spatiotemporal model and machine learning, there was a tendency for the suicide mortality rate of groups with low levels of education to be calculated to be high.