Machine learning-based analysis of regional differences in out-of-hospital cardiopulmonary arrest outcomes and resuscitation interventions in Japan

Study design

We conducted a retrospective study utilizing prospectively recorded Japanese Utstein-style EMS activity records. The Ethics Committee of Nara Medical University approved the study (No. 3353), and the requirement for informed consent was waived owing to the use of anonymized records. This study was conducted in accordance with the tenets of the Declaration of Helsinki.

Study population and data collection

Japan has an aging population as 28.9% of its 130 million people are aged > 65 years¹⁰. The country consists of 47 prefectures with varying population densities of 65.4–6,399.5 individuals/km². EMSs respond to all emergency calls and transport approximately 125,000 patients with OHCA to hospitals annually¹¹. Emergency protocols, based on the Japanese Resuscitation Council’s Resuscitation Guidelines¹² and revised every 5 years, are developed and implemented by 250 regional health managers. Each medical control region is supervised by a council established in each prefecture, tailoring protocols to local conditions^13,14,15. EMS activities are recorded in the Utstein style and verified by the medical control council, and all records are collected annually by the Fire and Disaster Management Agency¹¹. Our analysis included prehospital records of patients with OHCA resuscitated by EMS and transported to hospitals in 47 prefectures between 2015 and 2020, excluding patients aged < 18 years and those with non-cardiogenic cardiopulmonary arrest to reduce pathology variability.

Investigating Japanese EMS practices

In Japan, EMS is activated via a Communications Command Center upon receiving emergency calls. Bystanders may be instructed to administer cardiopulmonary resuscitation (CPR) over the telephone if cardiac arrest is suspected. Each ambulance includes a team of three, often featuring emergency life-saving technicians capable of advanced airway management and adrenaline administration for OHCA, under online medical control supervision. Additionally, hospital destinations are determined during field operations, and all patients, barring those with evident signs of death, are transported to a hospital.

Data collection and pre-processing

We employed 23 factors and prefecture numbers from the Utstein-style EMS activity records as predictors, including county number, age, year and month of onset, bystander type, initial rhythm, number of defibrillations, number of adrenaline boluses administered, and elapsed time of each activity. Notably, the prefecture number was treated as a continuous variable due to its sequential allocation from north to south. This approach aimed to capture potential spatial correlations between adjacent prefectures. We also conducted a similar analysis using one-hot encoding for the prefecture numbers, and the outcomes did not contradict the results obtained when treating the prefecture number as a continuous variable. Categorical data were one-hot encoded. Remarkably, in the case of missing data, we refrained from substituting them with any particular value. Instead, the data missingness was coded as a separate category, which was incorporated into our analysis as a separate data element. Selected continuous variables were standardized using z-score normalization, a method that confers advantages in machine learning algorithms such as neural networks by aiding gradient descent convergence and mitigating issues related to weight initialization and gradient problems. Time factors, which were initially considered continuous variables, were one-hot encoded as categorical data¹⁶ because of their non-linear relationship with prognosis in cardiopulmonary resuscitation. The time factors were measured in minutes and thus represented as 1, 2, 3, 4, … minutes.

Cases in which a specific intervention, such as defibrillation or drug administration, was not performed were also considered. These were coded as “no intervention” and incorporated into the contact-to-intervention column, allowing the model to reflect a comprehensive range of patient experiences. These steps resulted in 249 features (see Supplementary Table S1). Subsequently, we constructed a machine learning model to predict good neurological outcomes 1 month after cardiac arrest, based on the cerebral performance category (CPC) score¹⁷—a binary classification (Yes/No), with CPC1/2 signifying good neurological outcome and CPC3-5 indicating poor neurological outcome—sourced from the Utstein records.

Dataset selection and predictive model development

We stratified and randomly split the training and test datasets using an 8:2 ratio based on CPC1/2 to ensure a consistent ratio for predictive model construction. The prediction model was built using the neural network with the best average class sensitivity after several machine learning model trials. The compared methods included logistic regression, support vector machine, decision tree, random forest, and LightGBM⁹. To balance model bias (underfitting) and variance (overfitting), we applied a stratified cross-validation method (five-fold) using CPC1/2, along with batch normalization and dropouts in each neural network layer. The model’s accuracy plateaued after increasing the number of layers to five because of which we used a five-layer network to optimize learning costs. The sigmoid function served as the activation function and binary cross-entropy served as the loss function¹⁸. We measured model performance using area under the receiver operating characteristic curve (AUROC) and accuracy during training.

Imbalanced datasets significantly affect minority class performance. To address misclassification, we simulated based on predicted CPC1/2 numbers and employed class weighting during training to balance sensitivities, considering trade-offs. Our model aimed to maximize the majority class (CPC3–5) sensitivity without excessively reducing minority class (CPC1/2) sensitivity. We set CPC1/2 sensitivity at 80% and tested weights from 1 to 100 in 0.1 increments to optimize CPC3-5 sensitivity.

Additional training parameters included a batch size of 1,024,100 epochs, a learning rate of 0.001, and Adam optimizer. We conducted training using Python version 3.8.5 (Python Software Foundation, Beaverton, OR, USA).

Adjusting time parameters in the simulation method

We assessed the association of EMS activity duration with predicted CPC1/2 counts by simulating the constructed prediction model on a test dataset (n = 92,108), containing all previously split prefectures from the training set. The simulation methodology involved three time factors: elapsed time from EMS arrival to hospital arrival (a), EMS arrival to first defibrillation (b), and EMS arrival to first drug administration (c).

Previous studies have shown that these temporal factors are important prognostic predictors of EMS activity time^{19,20,21,22,23,24,25,26}. For example, shorter time from EMS arrival to defibrillation^19,25 and from EMS arrival to drug administration^{20,21,22,23,24,25} are associated with better survival and improved neurological outcomes in OHCA patients. The prognostic impact of EMS providers staying on scene and performing their activities has also been reported²⁶. Patients with non-shockable initial rhythm were excluded for (b), and those with EMS-witnessed cardiac arrest were excluded for (c). Time factors increased or decreased by − 5 to + 5 min for defibrillation and drug administration, and from − 5 to + 10 min for EMS arrival to hospital arrival time, in 1-min increments. We created a dataset adjusting each time factor in the test dataset and calculated the average predicted CPC1/2 score using the created prediction model. Then, we determined the percentage change in mean predicted CPC1/2 count to assess the association of time increase/decrease with the unadjusted data. We focused on percentage change relative to unadjusted data for a prefecture-specific analysis. A heat map visualized and evaluated the proportion of change between time adjustment and mean predicted CPC1/2 count.

Comparison of predicted changes of CPC1/2 counts across prefectures

We employed the same time adjustment method to estimate and visualize predicted CPC1/2 counts for the test dataset split by prefecture. We identified the time adjustments most associated with prognosis in each prefecture for the combinations (a) & (b) and (a) & (c), revealing treatment and EMS arrival to hospital arrival time adjustments with the greatest potential to improve predicted prognosis.

Statistical Analyses

Patient characteristics are summarized as medians and interquartile ranges (IQRs) for continuous variables and counts and percentages for categorical variables. Additionally, the evaluation metric for the five models is expressed as means ± standard deviations. The standard deviations were calculated based on the variations in the evaluation metric across the five-fold cross-validation.

Source link