COURSE: Capstone Project for Data Science & Engineering Master's Program at UC Sand Diego (UCSD) 2019 PROJECT NAME: Electron Microscopic Data Analysis PROJECT DESCRIPTION: The “Electron Microscopic Data Analysis” project aims to support the processing, analysis and dissemination of large-scale 3D electron microscopic (EM) data derived from a remarkable collection of legacy biopsy brain samples from patients suffering from Alzheimer’s disease (AD). Alzheimer’s disease is a progressive neurodegenerative disease, leading to dementia accompanied by several structural changes in the brain of patients. The project intends to facilitate the processing and downstream analysis of complete whole cell reconstructions of neurons from unique biopsy image samples of cerebral cortex taken from AD cases by Robert Terry in the 1960’s, focusing on early onset cases, where cells effected to differing extents are neighbored by cells without AD-associated forming paired helical filaments, (PHF) now known to be largely made up of tau proteins. These samples were screened and preliminarily reported on by Ellisman, Masliah and Terry (Ellisman et al., 1987) and manifest near perfect preservation of ultrastructure PHF and amyloid accumulations as well as modifications to subcellular organelles and cytoskeletons of the cell bodies, axonal and dendritic processes. Raw data (or data) is coming from Cell Image Library (CIL: http://cellimagelibrary.org/cdeep3m). Cell Image library is a public and easily accessible resource database of images, videos, and animations of cells, capturing a wide diversity of organisms, cell types, and cellular processes. The purpose of this database is to advance research on cellular activity, with the ultimate goal of improving human health. This data is accessed manually from the website and copied to the memory disk for processing. The data is a serial block face scanning electron microscopy (SBEM) 3D data volumes. Serial block-face scanning electron microscopy (SBEM) is a way to obtain high resolution 3D images from a sample. This method is particularly good at imaging large fields-of-view in X,Y,Z that is 3dimension at nanometer resolution. The data is brain image samples of patients suffering from Alzheimer’s disease. The Alzheimer's image data images are of very high definition. The size of a typical image in multiple datasets is in the range of 16000x10000x400 pixels images. TECHNICAL DETAILS: This project folder contains following files/folders: Readme: A file to describe about project and scripts cdeep3m_py.zip: source file for 1st model CDeep3M MultiResUnet_scripts.zip: source file for 2nd model MultiResUnet TrainingImages_Mitos_1_80.zip: Input data which contains Train, Validation and Test images DeepEM3D.zip: CDeep3M model predicted output on Test images MultiResUnet.zip: MultiResUnet model predicted output on Test images Electron Microscopic Data Analysis.pdf: Project detailed report SCRIPTS: MODEL: CDeep3M: The folder cdeep3m_py.zip contains scripts for all 3 models: 1fm, 3fm and 5fm. To run: preprocessing training data: python3 PreprocessTrainingData.py preprocessing validation data: python3 PreprocessValidation.py training: runtraining.sh --validation_dir --numiterations 50000 prediction: runprediction.sh will contain the output in 1fm, 3fm and 5fm folder corresponding to 1fm, 3fm and 5fm model and final merged output in folder ensembled. MODEL: MultiResUnet: The folder MultiResUnet_scripts.zip contains scripts for all 3 models: 1fm, 3fm and 5fm training and their corresponding prediction with ensembled prediction. To run: Preprocessing: from python_training preprocessing training data: python3 PreprocessTrainingData.py preprocessing validation data: python3 PreprocessValidation.py Training 1fm: from python_training python3 run_python_training.py --training_data --validation_data --batch_size 32 --nOfItr 2400 Training 3fm: from python_training_3D python3 run_python_training.py --training_data --validation_data --batch_size 32 --nOfItr 2400 Training 5fm: from python_training_5D python3 run_python_training.py --training_data --validation_data --batch_size 32 --nOfItr 2400 Prediction 1fm: from python_prediction runprediction.sh <1fm_trainedNetworkPath> --models "1fm" Prediction 3fm: from python_prediction_3D runprediction.sh <3fm_trainedNetworkPath> --models "3fm" Prediction 5fm: from python_prediction_5D runprediction.sh <5fm_trainedNetworkPath> --models "5fm" Prediction Ensembled: from python_prediction_ensembled runprediction.sh <1fm_trainedNetworkPath> will contain the output in 1fm, 3fm and 5fm folder corresponding to 1fm, 3fm and 5fm model and final merged output in folder ensembled. RAW DATA: TrainingImages_Mitos_1_80.zip contains data for training, validation and test: training_images: Contains 80 PNG images training_labels: Contains 80 PNG images labels corresponding to 80 training images validation_images: Contains 15 PNG images validation_labels: Contains 15 PNG images labels corresponding to 15 validation images test_images: Contains 5 PNG images test_labels: Contains 5 PNG images labels corresponding to 5 test images, not used by scripts - only for end user to calculate test accuracy OUTPUT DATA: MODEL: CDeep3M: The folder DeepEM3D.zip contains output trained model and final predicted(segmented) images of test data: EroMito_trnet: Contains trained net file for each 1fm, 3fm and 5fm prediction: Contains the output in 1fm, 3fm and 5fm folder corresponding to 1fm, 3fm and 5fm model and final merged output in ensembled folder. Processed test images are present inside augimages folder. MODEL: MultiResUnet: The folder MultiResUnet.zip contains output trained model and final predicted(segmented) images of test data: python_training_dump: Contains trained net h5 file for 1fm model python_training_dump_3d: Contains trained net h5 file for 3fm model python_training_dump_5d: Contains trained net h5 file for 5fm model python_prediction: Contains the output in 1fm, 3fm and 5fm folder corresponding to 1fm, 3fm and 5fm model and final merged output in ensembled folder. Processed test images are present inside augimages folder. REQUIREMENTS: Python3 with numpy, skimage, pilimage, h5py, shutil, sys, os, keras, tensorflow, joblib, multiprocessing, json, cv2, matplotlib parallel utility in Unix At least 1 GPU