ePoster

Towards predicting Stroke Etiology from MRI and CT Imaging Data of Ischemic Stroke Patients

Beatrice Guastella, Steffen Tiedt, Hannah Spitzer
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Beatrice Guastella, Steffen Tiedt, Hannah Spitzer

Abstract

Recognizing the causes of ischemic stroke is crucial for defining secondary preventive strategies. Etiology-based classification systems integrate clinical, imaging, and laboratory findings to assign stroke etiology[1-4]. However, this requires significant time and resources, and these systems still fail to assign etiology in up to 53% of cases[5]. This project aims to develop a Machine Learning (ML) model to predict etiology directly from neuroimaging data of ischemic stroke patients, in order to automate and expedite etiology assignment, and potentially outperform existing systems by identifying etiology in undetermined cases. Data for our project were collected through the PROMISE study, which includes neuroimaging (T1w MRI or NCCT), clinical data, laboratory findings, and etiology assignments (based on the TOAST classification system) for 504 ischemic stroke patients. We extracted two types of features from manually segmented stroke lesions: radiomics features (with pyradiomics) and lesion location features (crafted using published human brain atlases)[6-11]. In total, we extracted 1047 features, which were used to train a Random Forest (RF) model (with scikit-learn[12]). This model is designed to correctly classify stroke cases with known etiology and, specifically, distinguish cardioembolic cases (75% of samples) from others. We selected cases with known etiology (55% of samples), and the resulting dataset was then split into a learning and a test set (70% / 30%) using stratified random sampling. Both sets were median-centered and scaled to the 5th-95th percentile (calculated on the learning set only to prevent data leakage). We used Out-Of-Bag (OOB) estimates on the learning set for hyperparameter optimization, focusing on the number of trees in the forest and the number of features per tree. The highest balanced accuracy was achieved with 500 trees and the square root of the total number of features. With these settings, the model was trained on the learning set using 10-fold stratified cross-validation, achieving an average balanced accuracy of 0.56 ± 0.1. To improve these results, we plan to examine the influence of other hyperparameters on model performance, and consider feature selection or dimensionality reduction before training. Additionally, we may explore other classifiers and implement an unsupervised model to investigate latent structures within our data.

Unique ID: bernstein-24/towards-predicting-stroke-etiology-3b7d5f36