Cardiac MRI Image Segmentation for Left Ventricle and Right Ventricle using Deep Learning Faculty Advisor: Mai​ ​Nguyen​ <​mhnguyen@ucsd.edu​> Domain Experts/Project Advisors: Marcus Bobar, Tony Reina Contributors: Bosung Seo , Daniel Mariano , John Beckfield , Vinay Madenur , Yuming Hu Project Description The goal of this project is to use magnetic resonance imaging(MRI) data to provide an end-to-end analytics pipeline for left and right ventricle (LV & RV) segmentation. The previous cohort successfully segmented the left ventricle endocardium. Using that as a foundation to build upon, we ran experiments segmenting the left ventricle epicardium as well as the right ventricle endocardium. Additionally, another aim of the project was to find a model that would be generalizable/applicable across medical imaging datasets. In the experiments that we detail later in this paper, we utilized a variety of models, datasets, and tests to determine which one is well suited to this purpose. For the left ventricle, we mapped both the epicardium and endocardium surfaces. The endocardium segment is the area of the left ventricle contained by the inside of the wall of the left ventricle. The left ventricle epicardium segment is the area contained by the outside of the wall of the left ventricle, and once the contours of the inside and outside of the wall are known, the contours of the myocardium, the actual wall, can be calculated. The contours of the myocardium are used in detecting the severity of damage done by heart attacks. For the right ventricle, we mapped the endocardium surface. In this work, we implemented three models, 2-D Unet, 3-D Unet, and Densenet. While maintaining a consistent preprocessing strategy, we tested each model’s performance when trained on both data from the same dataset as the test data, and when trained on data from a different dataset than the test dataset. Data augmentation was also used to increase the adaptability of the models. The results were compared to determine performance and generalizability. Overall, the 2-D Unet proved to be the fastest to train, and the most generalizable, while the 3-D Unet had the best overall performance. We also found that training on the Automated Cardiac Diagnosis Challenge (ACDC) dataset, which is quite large and has very high quality images, performed better when testing the smaller datasets Sunnybrook and Miccai RV, compared to models trained on those smaller datasets. Khened Model The densenet khened model and associated code can be found in the densenet_khened directory. In terms of running the model to reproduce results from our paper, the relevant files in the densenet_khened directory one would need to change would be estimators/preprocess.py, estimators/test.py and estimators/config.py. In order to first produce preprocessed data from the raw datasets that we used(ACDC, LV-2011, Miccai RV, Sunnybrook), one will have to change the preprocess_data_path variable in preprocess.py to whatever data path the raw data is in. Next in config.py, one will want to change the property save_preprocessed_files to True and only_preprocessing to True. Then one can run preprocess.py where the output preprocessed files will go to where the output_dir variable in config.py is located. Currently, our preprocessing will do 1x1 pixel spacing normalization, center cropping to 176x176, and z-score contrast normalization. After the processed data is created, one can then move it over to a directory of one’s choosing while creating train_set and validation_set subdirectories within it. From there, place the processed npy files one would like to train on in the train_set directory and similarly, one would place the appropriate files in the validation_set directory. Once that is done, one can run train.py which will after successfully completing output the generated model files in the output_dir config variable path. For test.py, a variable one would want to change would be final_test_data_path to whatever data path the processed numpy files are in. Additionally, the variable gt_available should be set to True/False depending on whether ground truth data exists in the data path one is pointing to. For config.py, the following variables should be modified accordingly: * data_path - directory containing subdirectories of train_set and validation_set where the processed numpy files will be for training * output_dir - path where the output preprocessed or trained model files should be based on whether one is running preprocessing or predictions * run_name - represents the dataset being predicted on, Ex) ‘FCRD_ACDC’ * using_only_ED_ES - represents whether ED/ES frame knowledge is known for dataset being predicted on * save_preprocessed_files - saves output preprocessed numpy files in directory of output_dir + run_name * using_outside_preprocessing - indicates whether preprocessing has already happened for input files or if one has to do preprocessing for the current run * save_predictions - saves output prediction numpy files in directory of output_dir + run_name * only_preprocessing - set to true when only doing preprocessing of files with output processed files saved to data_path The command needed to run an existing model would be to run python test.py. Upon successfully running the model for predictions, a report with statistics on averages and standard deviations for the evaluation metrics(Dice,Hausdorff) will be generated. These scores will be separated into the relevant heart parts and ED/ES where applicable. For 3 of the datasets we evaluated(LV2011, Miccai RV, Sunnybrook), the high level process to train a model and predict( keeping in mind to change the config.py appropriately between each step) is as follows: 1. Run preprocess.py 2. Move data into a new directory containing train_set and validation_set subdirectories 3. Run train.py 4. Run test.py For ACDC, the process is a little different since as its data is prepared differently which are the following steps: 1. Create a folder outside the project with name ACDC_DataSet and copy the dataset to it. 2. From the project, folder open file data_preprocess/acdc_data_preparation.py. 3. In the file, set the complete_data_path variable to your path to the ACDC training dataset Ex) complete_data_path = '../../ACDC_DataSet/training'. 1. Run acdc_data_preparation.py. 2. The prepared data for training is then generated outside the project folder named processed_acdc_dataset. From there, one can point at that generated data in preprocess.py and then follow the same process as above. 1. Run preprocess.py 2. Move data into a new directory containing train_set and validation_set subdirectories 3. Run train.py 4. Run test.py 2-D Unet The 2_D Unet is found in the 2DUnet folder. The relevant files are C_Training2D.py, C_Predict-2D.py, and get_metrics.py. Model_config.py is also used in support of the training and prediction. The first step is to get the numpy files from the Khened Preprocessing step, and split them into training, validation, and test. If you are duplicating our project exactly, the split for ACDC is patients 007, 009, 018, 023, 034, 035, 041, 042, 052, 067, 071, 075, 084, 094, 096, are in the test set, patients 004, 012, 020, 022, 024, 037, 049, 058, 059, 065, 068, 073, 088, 095, 099 are in the validation set, and everything else is in the training set. If you are using this as a jumping off point for something else, then you don’t need this exact split, but for ACDC, we recommend stratified sampling, as patients 1-20 have one pathology, 21-40 have another, and so on. For the LV Segmentation Challenge, Test is patients 0000801, 0002501, 0003501, 0003801, 0004301, 0004901, 0005401, 0005601, 0006001, 0006501, 0014101, 0015401, 0042601, 0044601, 0045301. Validation is patients 0000101, 0001301, 0003201, 0003701, 0004201, 0004801, 0005101, 0005801, 0006201, 0007101, 0014201, 0024401. 0043201, 0044801, and the rest are Training. For Miccai RV, Test is patients 04, 12, and 16, Validation is patients 09 and 14, and the rest are Training. Finally, for Sunnybrook, patients HF-I-10, HF-I-8, HF-NI-14, HF-NI-15, HYP-10, HYP-3, N-10, N-9 are in the test set, patients HF-I-9, HF-NI-12, HF-NI-34, HYP-38, HYP-9, N-7 are in the validation set, and the rest are the training set. Our project used Khened preprocessing, but the 2D Unet is only dependent on the data being in 176X176 numpy matrix files, if you want to try other preprocessing steps. Once the train-validation-test split is created, the next step is to point the model to the data files. The model expects the training set to be in a folder called train_set, the validation data to be in validation_set and the test data in test_set. For example, lets pretend we have ACDC data in //User/data/acdc/train_set/patient001, //User/data/acdc/train_set/patient002, //User/data/acdc/validation_set/patient003, and //User/data/acdc/test_set/patient004. To tell the model to use this data, open model_config.py and set acdc_data to //User/data/acdc. Similarly, miccai_data should point to wherever you put your miccai RV train_set, validation_set, and test_set folders, //User/data/miccai in our example. cap_lv_data and sunnybrook_data should point to the data from the Segmentation LV challenge and Sunnybrook data respectively. You are almost ready to train your model. Open C_Training2D.py, and find config["datasets"] = ["ACDC"] on line 47. Modify “ACDC” to the dataset that you want to train. “MICCAI”, “SUNNYBROOK”, and “CAP_LV” are the other options. If you want to train on multiple datasets, add the other dataset to the list, config[“datasets”] = [“ACDC”, “MICCAI”] for example. Install the packages needed, like keras, nibabel, simpleITK, nibabel, nilearn, sklearn and others. Finally, run python C_Training2D.py. Early Stopping should stop after ~180 epochs. The trained model should be saved as Cardiac_Unet2D_2017_model.h5. Once training finishes, open C_Predict-2D.py, and modify line 22 so config[“datasets”] specifies the dataset that you want to predict the test_set of. If you are doing testing without having the ground truth, comment out lines 284 and 285. Afterwords, run python C_Predict-2d.py. It will create a predictions folder, that will have 3 files per original file. A truth file, a prediction file, and a data file, unless you don’t have a label for the truth and commented out lines 284 and 285. If you do have the truth labels, get_metrics.py will calculate the dice score and hausdorff values when comparing the truth to the predictions. All of that will be put in accuracy_metrics.txt. The only step to do before running get_metrics.py is to make sure medpy is installed. 3-D Unet The 3D Unet is found in the 3DUnet folder. The relevant files for each dataset is found under the folder for that dataset. There are separate files for Preprocessing, Training and Predict. First execute the Preprocessing script to generate preprocessed data and then copy that data to the path specified in config["data_path"] variable of the Training file. Once this complete, execute the Training script to train the model and then execute the Predict script file. For e.g. to train and test on ACDC data, execute in this order 1. 3DUnet/acdc/C_Preprocessing_acdc.py 2. 3DUnet/acdc/C_Training_3D_acdc.py 3. 3DUnet/acdc/C_Predict_3D_acdc.py Prediction result will be shown under folder named ‘prediction_3D’ (for Nifti format) and folder named ‘prediction_3D_npy’ (npy format). In C_Training_3D_acdc.py, modify config['training_patients'], config['validation_patients'], config['test_patients'] to change the split between training set, validation set and test set.'001' corresponds to ‘patients001’. For training and test other datasets, it is similar. If want to use random splitting, set config['RandomSelectData']= True. In default, all image spacing will be normalized to 1.0. If need to change, modify config["image_spacing"] to desired value. Except for ACDC dataset, all other datasets need to combine images under the same patient folder into 3D Nifti format image. Then run C_Preprocessing_acdc.py for normal preprocessing.