Glioblastoma brain tumor segmentation - Part 3 - Setup the deep learning framework

Setting up the data folders required for our deep learning framework

In our previous post, we downloaded the UPenn Glioblastoma MRI dataset from TCIA and then uploaded the files to our Google drive.

In the images_structural folder, you can see that there are 671 folders belonging to 630 patients. Some have an _21 suffix, indicating it was a second scan. Each folder has 4 MRI files, indicating the MRI sequence - T1, T1-GD, T2, and FLAIR.

Our goal is to train a Deep Learning model that can segment the Glioblastoma tumor regions. We will compare the segmentation generated by our model to the ground truth segmentation that was created by expert radiologists. However, only 147 of the 630 patients have the expert ground truth images in the images_segm folder. We’ll need to address this prior to model training.

We are going to use nnU-Net to perform the automated segmentation.

nnU-Net Overview

From the nnU-Net website - “nnU-Net1 is a semantic segmentation method that automatically adapts to a given dataset. It will analyze the provided training cases and automatically configure a matching U-Net-based segmentation pipeline. No expertise required on your end! You can simply train the models and use them for your application.”

I highly recommend you read through the nnU-Net website in detail to get an idea of its scope and capabilities. Here is an overview of the nnU-Net configuration -

  • The entry point to nnU-Net is the nnUNet_raw_data_base folder

  • Each project is stored as a separate 'Task'.

  • Tasks are associated with a task ID, a three digit integer, recommended to start at 500 so they don’t conflict with existing pre-trained models. Some examples are shown below -

    nnUNet_raw_data_base/nnUNet_raw_data/
    ├── Task501_Glioblastoma
    ├── Task505_Heart
    ├── Task510_Meningioma
  • Each task folder will further have the following sub-folders and files

    Task501_Glioblastoma/
    ├── dataset.json
    ├── imagesTr
    ├── (imagesTs)
    └── labelsTr
    
    • imagesTr contains the images belonging to the training cases. nnU-Net will run pipeline configuration, training with cross-validation, as well as finding postprocessing and the best ensemble on this data.

    • imagesTs (optional) contains the images that belong to the test cases ,

    • labelsTr the images with the ground truth segmentation maps for the training cases. Do not include labels for the test dataset here, or you will run into issues.

    • dataset.json contains metadata of the dataset.

  • All images, including label files, MUST be in the NiFTI format (.nii.gz)

  • Each patient may have multiple MRI sequences (T1, T1-CE/GD, T2, FLAIR)

  • The label files must contain segmentation maps that contain consecutive integers, starting with 0 (0, 1, 2, 3, ... n). 0 is considered background.

Setting up the folders in Google Drive

Thankfully, Google Colab runs on Linux virtual machines (VM) and allows us to enter Linux commands to make things much easier. You can do this by prefixing the commands with an exclamation mark (!) in the Colab cell, which indicates that it is an operating system command. We will run all our commands through the Colab interface when working with Google Drive folders. The Task ID I have chosen for this project is 501.

  1. Open a new Colab notebook - 00_t501_glio_folder_setup.ipynb

  2. Mount your Google Drive. This will give you access to your “MyDrive” folder

    from google.colab import drive
    drive.mount('/content/drive')

    You can expand the File icon in the left frame of Colab to see what has been mounted.

  3. Here’s the high level folder structure I want to create

    1. MyDrive/TCIA/nnUNet - All my nnUNet work stored here

    2. MyDrive/TCIA/nnUNet/nnUNet_raw_data_base - Base folder required by nnU-Net

    3. MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data - All tasks stored here

    4. MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_Glioblastoma - This is my GBM segmentation project (task) folder

  4. Run the “mkdir” Linux commands in your Colab cell to create this folder structure. The “-p” argument creates the folder only if it doesn’t exist, which is nice. Remember, these are OS commands, so prefix them with an !

    !mkdir -p /content/drive/MyDrive/TCIA/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data
    
    !mkdir -p /content/drive/MyDrive/nnUNet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_Glioblastoma
    

And here’s how that looks in my Google Drive. You may have to refresh the page to see the new folder.

We now have set up the basic folder structure required for nnUNet to run on our Glioblastoma dataset. We need to make some big picture decisions before we proceed to the next step.

  • Train-Test Split - Unfortunately, only 147 patients out of the 630 have manually segmented ground truth files for us to compare with. We can take two approaches to our model development -

    • Split the 147 patients in an 80:20 train:test proportion, or

    • Keep aside a few patients (10 is as good a number as any) as your test cohort and train the model on the remaining patients to provide more training data to your model

      In this project, we will split the 147 patients into training and test subsets since the goal is to show you how to build the model. It’s also easier for me to train the model on my free Colab account. You can try either approach.

  • Validation subset - nnU-Net uses a 5-fold cross-validation technique, which means we don’t need a separate validation dataset.

In our next post, we will see how to split the image dataset, and create the metadata file, dataset.json for nnU-Net to begin its processing. I hope you find this interesting and you will follow along.

  1. Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. https://github.com/MIC-DKFZ/nnUNet