- Transforming Healthcare Through AI
- Posts
- Glioblastoma brain tumor segmentation - Part 3 - Setup the deep learning framework
Glioblastoma brain tumor segmentation - Part 3 - Setup the deep learning framework
Setting up the data folders required for our deep learning framework
In our previous post, we downloaded the UPenn Glioblastoma MRI dataset from TCIA and then uploaded the files to our Google drive.
In the images_structural folder, you can see that there are 671 folders belonging to 630 patients. Some have an _21 suffix, indicating it was a second scan. Each folder has 4 MRI files, indicating the MRI sequence - T1, T1-GD, T2, and FLAIR.
Our goal is to train a Deep Learning model that can segment the Glioblastoma tumor regions. We will compare the segmentation generated by our model to the ground truth segmentation that was created by expert radiologists. However, only 147 of the 630 patients have the expert ground truth images in the images_segm folder. We’ll need to address this prior to model training.
We are going to use nnU-Net to perform the automated segmentation.
nnU-Net Overview
From the nnU-Net website - “nnU-Net1 is a semantic segmentation method that automatically adapts to a given dataset. It will analyze the provided training cases and automatically configure a matching U-Net-based segmentation pipeline. No expertise required on your end! You can simply train the models and use them for your application.”
I highly recommend you read through the nnU-Net website in detail to get an idea of its scope and capabilities. Here is an overview of the nnU-Net configuration -
The entry point to nnU-Net is the nnUNet_raw_data_base folder
Each project is stored as a separate 'Task'.
Tasks are associated with a task ID, a three digit integer, recommended to start at 500 so they don’t conflict with existing pre-trained models. Some examples are shown below -
nnUNet_raw_data_base/nnUNet_raw_data/ ├── Task501_Glioblastoma ├── Task505_Heart ├── Task510_Meningioma
Each task folder will further have the following sub-folders and files
Task501_Glioblastoma/ ├── dataset.json ├── imagesTr ├── (imagesTs) └── labelsTr
imagesTr contains the images belonging to the training cases. nnU-Net will run pipeline configuration, training with cross-validation, as well as finding postprocessing and the best ensemble on this data.
imagesTs (optional) contains the images that belong to the test cases ,
labelsTr the images with the ground truth segmentation maps for the training cases. Do not include labels for the test dataset here, or you will run into issues.
dataset.json contains metadata of the dataset.
All images, including label files, MUST be in the NiFTI format (.nii.gz)
Each patient may have multiple MRI sequences (T1, T1-CE/GD, T2, FLAIR)
The label files must contain segmentation maps that contain consecutive integers, starting with 0 (0, 1, 2, 3, ... n). 0 is considered background.
Setting up the folders in Google Drive
Thankfully, Google Colab runs on Linux virtual machines (VM) and allows us to enter Linux commands to make things much easier. You can do this by prefixing the commands with an exclamation mark (!) in the Colab cell, which indicates that it is an operating system command. We will run all our commands through the Colab interface when working with Google Drive folders. The Task ID I have chosen for this project is 501.
Open a new Colab notebook - 00_t501_glio_folder_setup.ipynb
Mount your Google Drive. This will give you access to your “MyDrive” folder
from google.colab import drive drive.mount('/content/drive')
You can expand the File icon in the left frame of Colab to see what has been mounted.
Here’s the high level folder structure I want to create
MyDrive/TCIA/nnUNet - All my nnUNet work stored here
MyDrive/TCIA/nnUNet/nnUNet_raw_data_base - Base folder required by nnU-Net
MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data - All tasks stored here
MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_Glioblastoma - This is my GBM segmentation project (task) folder
Run the “mkdir” Linux commands in your Colab cell to create this folder structure. The “-p” argument creates the folder only if it doesn’t exist, which is nice. Remember, these are OS commands, so prefix them with an !
!mkdir -p /content/drive/MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data !mkdir -p /content/drive/MyDrive/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_Glioblastoma
And here’s how that looks in my Google Drive. You may have to refresh the page to see the new folder.
We now have set up the basic folder structure required for nnUNet to run on our Glioblastoma dataset. We need to make some big picture decisions before we proceed to the next step.
Train-Test Split - Unfortunately, only 147 patients out of the 630 have manually segmented ground truth files for us to compare with. We can take two approaches to our model development -
Split the 147 patients in an 80:20 train:test proportion, or
Keep aside a few patients (10 is as good a number as any) as your test cohort and train the model on the remaining patients to provide more training data to your model
In this project, we will split the 147 patients into training and test subsets since the goal is to show you how to build the model. It’s also easier for me to train the model on my free Colab account. You can try either approach.
Validation subset - nnU-Net uses a 5-fold cross-validation technique, which means we don’t need a separate validation dataset.
In our next post, we will see how to split the image dataset, and create the metadata file, dataset.json for nnU-Net to begin its processing. I hope you find this interesting and you will follow along.
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. https://github.com/MIC-DKFZ/nnUNet