Project Resources

Below, we've aggregated some resources which we hope you will find useful as you work on your final projects. This list of resources will be continually updated throughout the semester.

Datasets
PhysioNet

PhysioNet presents a repository of medical research data. We would especially draw your attention to the following:

  • MIMIC-IV - comprehensive clinical information on hospital stays for patients admitted to a tertiary academic medical center in Boston, MA. (Helpful tutorials on accessing and working with MIMIC-IV: publication, analysis tutorial, data tutorial).
  • HiRID - critical care dataset containing data relating to almost 34 thousand patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital.
  • MIMIC CXR - dataset of chest radiographs in DICOM format with free-text radiology reports.
MMRF

Data from a 10-year observation study of 1000 newly diagnosed myeloma patients receiving various standard approved treatments. Access instructions here.

  • ML-MMRF is a GitHub repository built to process the MMRF CoMMpass Dataset and allows researchers to use these data for machine learning. It provides code to parse the raw MMRF files into tensors (stored in `numpy` matrices), clean and normalize the tensors, validate the procedure.
BraTS (Brain Tumor Segmentation)

Multi-institutional routine clinically-acquired pre-operative multimodal MRI scans of glioblastoma (GBM/HGG) and lower grade glioma (LGG), with pathologically confirmed diagnosis.

eICU

Dataset relating to patients who were treated as part of the Philips eICU program across intensive care units in the United States. It contains data regarding the clinical care of ICU patients. Access instructions here.

PPMI

Data from longitudinal study to define and measure biomarkers associated with Parkinson's disease. Access instructions here.

ISIC 2019 Challenge Dataset

25,331 images available for the classification of dermoscopic images among nine different diagnostic categories.

COUGHVID

Crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and corresponding COVID-19 statuses.

COVID-19 Clinical Trials dataset

Database of COVID-19 related clinical studies being conducted worldwide.