Development and validation of gestational age estimation algorithms for non-live births in administrative healthcare databases
Accurate algorithms for estimating the gestational age of non-live births are essential for studying their risks using administrative healthcare databases. Using the National Health Information Database of South Korea, we linked gestational week information at influenza vaccination from the national vaccination registry to establish a reference standard. A hierarchical algorithm was used to identify pregnancy episodes, and non-live births were stratified into spontaneous/induced abortions and stillbirths. Four approaches were tested: 1) assigning outcome-specific gestational ages, 2) adjusting gestational age based on gestational markers, 3) fitting a regression model, and 4) using a random forest model. Algorithms were evaluated by the proportion of estimates falling within 1–4 weeks of the reference standard and the mean squared error (MSE). Random forests performed best for predicting gestational age for both spontaneous/induced abortions (MSE 1.51 weeks²) and stillbirths (MSE 2.76 weeks²), with 91.7% (95% CI 90.8–92.5) and 89.2% (85.7–91.9) of predictions falling within two weeks of the reference standard, respectively. In an external validation set, the gestational marker-based adjustment approach performed best for spontaneous/induced abortions (MSE 8.97 weeks²), while the random forest model performed best for stillbirths (MSE 13.32 weeks²). The developed algorithms can support pregnancy research in the Korean population.
2025 Spring Convention