Abstract:
People with diabetes require lifelong access to healthcare services to delay the onset of complications. Their disease management processes generate great volumes of data across several domains, from clinical to administrative. Difficulties in accessing and processing these data hinder their secondary use in an institutional setting, even for highly desirable applications, such as the prediction of cardiovascular disease, the main driver of excess mortality in diabetes. Hence, in the present work, we propose a deep learning model for the prediction of major adverse cardiovascular events (MACE), developed and validated using the administrative claims of 214,676 diabetic patients of the Veneto region, in North East Italy. Specifically, we use a year of pharmacy and hospitalisation claims, together with basic patient's information, to predict the 4P-MACE composite endpoint, i.e., the first occurrence of death, heart failure, myocardial infarction, or stroke, with a variable prediction horizon of 1 to 5 years. Adapting to the time-to-event nature of this task, we cast our problem as a multi-outcome (4P-MACE and components), multi-label (1 to 5 years) classification task with a custom loss to account for the effect of censoring. Our model, purposefully specified to minimise data preparation costs, exhibits satisfactory performance in predicting 4P-MACE at all prediction horizons: AUROC from 0.812 (C.I.: 0.797 - 0.827) to 0.792 (C.I.: 0.781 - 0.802); C-index from 0.802 (C.I.: 0.788 - 0.816) to 0.770 (C.I.: 0.761 - 0.779). Components' prediction performance is also adequate, ranging from death's 0.877 1-year AUROC to stroke's 0.689 5-year AUROC.