QuNex quick start using a Docker container#

Quick start on deploying the QuNex suite starting from raw data to launching HCP pipelines using a container (Docker or Singularity/Apptainer).

Requirements#

Software requirements:

Hardware requirements:

  • At least 16 GB RAM.

  • ~100 GB of storage space for the container, the imaging data and results.

Step 1: Getting access to the container#

If you do not have access to the QuNex container registry, please follow the instructions at https://qunex.yale.edu/registration.

Step 2: Download and prepare the QuNex container and the qunex_container script#

To start, open your console or terminal app. This quick start assumes that you will be working in the ${HOME}/qunex directory. Note here that the ${HOME} path is user dependent; for example, if my username is john_doe then my home path will be typically /home/john_doe. Bash code below will go into your home directory and create a qunex subfolder where you will do your work:

# -- Go to your HOME FOLDER
cd $HOME

# -- Create the qunex subfolder
mkdir qunex

# -- Go into the newly created folder
cd qunex

Next, you have to download the container image onto your machine.

To use the QuNex Docker container image, execute:

# -- Pull the latest stable docker image
docker pull qunex/qunex_suite:<VERSION>

We advise you to use the latest stable container tag. The easiest way to find it is on our official forums (https://forum.qunex.yale.edu/) under release notes. The latest version when this document was prepared was 1.1.1, so you would run:

# -- If the latest stable tag is 1.1.1 you would execute
docker pull qunex/qunex_suite:1.1.1

Above instructions will pull the Docker container to your local machine. If you need the Singularity/Apptainer container, you need to download it from our storage:

wget --show-progress -O qunex_suite-1.1.1.sif 'https://jd.mblab.si/qunex/qunex_suite-1.1.1.sif'

When done the Singularity/Apptainer container image will be in the local file system and the folder's size should be around 20GB. You can check this by running

ls -lh

Once the QuNex Docker (or Singularity/Apptainer) container is downloaded you should download the qunex_container script. This script allows executing and scheduling QuNex commands via the previously downloaded container in a user friendly fashion through the QuNex container. With qunex_container you can execute QuNex commands inside the container selected by the --container parameter which points to the container you want to use. To use the script we should add it to the PATH variable (you can also copy it into a folder that is already in PATH, e.g. /usr/bin) and make it executable. So you do not need to add it to the PATH every time you restart your computer, you can also amend you .profile:

# -- Download the script
wget http://jd.mblab.si/qunex/qunex_container

# -- Add to path
export PATH=${HOME}/qunex:${PATH}

# -- Add to profile to make it persistent when you logout
echo 'export PATH=${HOME}/qunex:${PATH}' >> ~/.profile

# -- Make executable
chmod a+x ${HOME}/qunex/qunex_container

To test if the script is working type qunex_container into the console. If everything is OK, the script's help will be printed out.

Step 3: Download the example data#

Now that we can download the example data, we will put the data into the data subfolder inside our ${HOME}/qunex folder. Data is composed from three files:

  • The imaging data (HCPA001.zip) zip file contains the actual fMRI recordings.

  • The batch file (HCPA001_parameters.txt) contains a number of parameters to be used in preprocessing and analysis commands. These parameters are typically stable and do not change between various commands. For details see Batch files Wiki page.

  • The mapping specification file (HCPA001_mapping.txt) is an essential element of running QuNex and ensures that the raw nii data is onboarded (mapped to the defined HCP naming convention) correctly. For details see Mapping specification files Wiki page.

# -- Create the data dir
mkdir data

# -- Go into the data dir
cd data

# -- Download the imaging data
wget http://jd.mblab.si/qunex/HCPA001.zip

# -- Download the parameters file
wget http://jd.mblab.si/qunex/HCPA001_parameters.txt

# -- Download the mapping specification file
wget http://jd.mblab.si/qunex/HCPA001_mapping.txt

If the data is properly prepared the commands below should give you the provided output:

# -- Check our location
pwd

# -- Output should look like this:
# ${HOME}/qunex/data

# -- Inspect the folder structure
tree

# -- Output should look like this:
# .
# ├── HCPA001_parameters.txt
# ├── HCPA001_mapping.txt
# └── HCPA001.zip

Note that the example data was acquired by ourselves using the HCP acquisition protocol on a Siemens 3T Prisma scanner. If your own data uses a different acquisition protocol and a different scanning device, then you will most likely need to adjust some of the values in the HCPA001_parameters.txt and HCPA001_mapping.txt files.

Step 4: Onboard the data and process HCP minimal preprocessing pipeline through QuNex#

There are two options now:

  • Option A: You can onboard and process the data in a "turnkey" style, with a single QuNex command called run_list. This option requires less work but also gives you less control as it will automatically trigger the next command in line once one of the commands finishes.

  • Option B: You can execute every command as an independent step, this gives you more control as you can inspect the outputs of each step once it completes before continuing with the next step.

Option A: Execution of a chain of commands through a single QuNex command#

First, we will download the recipe file, which includes the chain of commands that we plan to execute. In our case, we will import the downloaded DICOM data, prepare it for HCP processing and run it thorough the HCP minimal preprocessing pipelines.

# -- Go into the data dir
cd data

# -- Download the recipe file
wget http://jd.mblab.si/qunex/recipe.yaml

# -- Inspect the folder structure
tree
t
# -- Output should now look like this:
# .
# ├── HCPA001_parameters.txt
# ├── HCPA001_mapping.txt
# ├── HCPA001.zip
# └── recipe.yaml

Next, we will prepare some parameters/variables that we will use throughout processing.

# -- Set the name of the study
STUDY_FOLDER="${HOME}/qunex/quickstart"

# -- Location of the datadata
RAW_DATA="${HOME}/qunex/data"

# -- Bash post command to be executed once processing enteres the container
# -- We need to add this so the paths used in the recipe will be properly set in the container
BASH_POST="export STUDY_FOLDER=${STUDY_FOLDER};export RAW_DATA=${RAW_DATA}"

# -- Specify the container
# -- For Docker use the container name and tag:
QUNEX_CONTAINER="qunex/qunex_suite:<VERSION>"

# -- For Singularity/Apptainer define an absolute path to the image
# QUNEX_CONTAINER=${HOME}/qunex/qunex_suite-<VERSION>.sif

The BASH_POST part deserves a bit of an explanation. When entering the container, we are entering a new operating system that is encapsulated within the container, meaning that the variables we set in our current system will not be available in the system within the container. For this purpose, QuNex offers the --bash_post parameter that will execute bash code after entering the container, but before executing the QuNex command. This way, we can do custom operations in the container to prepare something if needed before QuNex will be ran. In our case we need to set the STUDY_FOLDER and RAW_DATA variables in the system inside the container so they will be properly used by run_recipe.

The contents of the recipe file look like this:

global_parameters:
    studyfolder     : "{{$STUDY_FOLDER}}"
    sessionsfolder  : "{{$STUDY_FOLDER}}/sessions"
    sessions        : "HCPA001"
    batchfile       : "{{$STUDY_FOLDER}}/processing/batch.txt"

recipes:
    quick_start:
        commands:
            - create_study
            - import_dicom:
                masterinbox : "{{$RAW_DATA}}"
            - create_session_info:
                mapping : "{{$RAW_DATA}}/HCPA001_mapping.txt"
            - setup_hcp:
                sourcefolder : "{{$STUDY_FOLDER}}/sessions/HCPA001"
            - create_batch:
                targetfile  : "{{$STUDY_FOLDER}}/processing/batch.txt"
                paramfile   : "{{$RAW_DATA}}/HCPA001_parameters.txt"
            - hcp_pre_freesurfer
            - hcp_freesurfer
            - hcp_post_freesurfer
            - hcp_fmri_volume
            - hcp_fmri_surface

On the top we have global parameters that are applied over all commands, followed by the list of commands that will be executed in the provided order. Each command then also has command specific parameters (e.g., --masterinbox and --archive for import_dicom). You can also provide additional parameters in the command line when invocating the run_list command. Parameters provided there have the highest priority, followed by command specific parameters and global parameters. Meaning that if the same parameter is provided in global_parameters and at the command line execution call, the one from the execution call will be used.

Note the special {{ }} markings, this can be used in order to inject system variables into the recipe. For example, {{$STUDY_FOLDER}} will be replace with the STUDY_FOLDER variable that we set in the previous step.

All that is left now for us to do is to execute the recipe:

qunex_container run_recipe \
  --recipe_file="${RAW_DATA}/recipe.yaml" \
  --recipe="quick_start" \
  --bash_post="${BASH_POST}" \
  --bind="${HOME}:${HOME}" \
  --container="${QUNEX_CONTAINER}"

We need to provide the path to the recipe file, the recipe name found in the recipes list of the recipes file (quick_start in our case) and the container. That is it! This will run the test session all the way through HCP minimal preprocessing pipeline. The --bind parameter will give the system inside the container access to your home folder where the data and the study are.

You can track the progress of execution inside the QuNex log folder, in this case at {{$STUDY_FOLDER}}/processing/logs. There are two types of logs there, the top level (overview) logs called runlogs and the more detailed logs called comlogs.

Scheduling the execution#

Most of high performance compute (HPC) systems do not allow running commands for a long time on the node that you login into. Instead, commands should be scheduled for execution. The qunex_container scripts allow easy scheduling via SLURM, PBS and GridEngine systems. Below is an example of how you can schedule the run_recipe command we just prepared using SLURM. The example reserves 2 CPUs and 16 GB of memory for 2 days.

qunex_container run_recipe \
  --recipe_file="${RAW_DATA}/recipe.yaml" \
  --recipe="quick_start" \
  --bash_post="${BASH_POST}" \
  --bind="${HOME}:${HOME}" \
  --container="${QUNEX_CONTAINER}" \
  --scheduler="SLURM,time=02-00,cpus-per-task=2,mem-per-cpu=16G,jobname=qx_recipe"

Option B: Step-by-step execution#

Parameter preparation#

First, we will prepare some parameters/variables that we will use throughout processing.

# -- Set the name of the study
STUDY_FOLDER="${HOME}/qunex/quickstart"

# -- Location of the data
RAW_DATA="${HOME}/qunex/data"

# -- Batch parameters file
INPUT_BATCH_FILE="${RAW_DATA}/HCPA001_parameters.txt"

# -- Mapping file
INPUT_MAPPING_FILE="${RAW_DATA}/HCPA001_mapping.txt"

# -- Sessions to run
SESSIONS="HCPA001"

# -- Specify the container
# -- For Docker use the container name and tag:
QUNEX_CONTAINER="qunex/qunex_suite:<VERSION>"

# -- For Singularity/Apptainer define an absolute path to the image
# QUNEX_CONTAINER=${HOME}/containers/qunex_suite-<VERSION>.sif

Execution of commands#

Below is the chain of commands that onboards the sample data, executes preparatory steps and runs the data through the HCP minimal preprocessing pipeline. If you are running this on a high performance compute cluster, you can add the scheduler parameter and execution of a command will be scheduled as job on that compute system.

First, we need to create the QuNex study folder structure.

qunex_container create_study \
  --studyfolder="${STUDY_FOLDER}" \
  --container="${QUNEX_CONTAINER}"

Next, we need to onboard our data. Here we specify the location of sessions in the study folder, sessions we are onboarding, the location of the raw data (masterinbox).a

# onboard the data, the masterinbox parameter defines
qunex_container import_dicom \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --sessions="${SESSIONS}" \
  --masterinbox="${RAW_DATA}" \
  --container="${QUNEX_CONTAINER}"

After the data is onboarded, we need to prepare the folder structure and extract all information required for HCP pipelines. Finally, we prepare a QuNex batch file, this file stores values of processing parameters as well as information about all sessions. Batch files are then used extensively by most QuNex commands to assure that the same parameters are used throughout all commands and that the same imaging data gets used everywhere. This is done through the set of commands below.

You can track the progress of a command's execution inside the QuNex log folder, in this case at {{$STUDY_FOLDER}}/processing/logs. There are two types of logs there, the top level (overview) logs called runlogs and the more detailed logs called comlogs.

# create a session info file with all relevant data for HCP
qunex_container create_session_info \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --sessions="${SESSIONS}" \
  --mapping="${INPUT_MAPPING_FILE}" \
  --container="${QUNEX_CONTAINER}"

# create the QuNex batch file
qunex_container create_batch \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --sessions="${SESSIONS}" \
  --targetfile="${STUDY_FOLDER}/processing/batch.txt" \
  --paramfile="${INPUT_BATCH_FILE}" \
  --container="${QUNEX_CONTAINER}"

# map the session from QuNex to HCP folder structure
qunex_container setup_hcp \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --container="${QUNEX_CONTAINER}"

Once this is completed, we can start executing the HCP preprocessing pipeline in a step-by-step fashion.

# hcp_pre_freesurfer
qunex_container hcp_pre_freesurfer \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --container="${QUNEX_CONTAINER}"

# hcp_freesurfer
qunex_container hcp_freesurfer \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --container="${QUNEX_CONTAINER}"

# hcp_post_freesurfer
qunex_container hcp_post_freesurfer \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --container="${QUNEX_CONTAINER}"

# hcp_fmri_volume
# note the parelements parameter which makes QuNex process all 4 BOLDs in parallel
qunex_container hcp_fmri_volume \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --parelements=4 \
  --container="${QUNEX_CONTAINER}"

# hcp_fmri_surface
qunex_container hcp_fmri_surface \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --parelements=4 \
  --container="${QUNEX_CONTAINER}"

Scheduling of commmands#

Just as with run_recipe you can schedule any of the above commands through a scheduler on a high performance compute (HPC) system. For example:

qunex_container hcp_fmri_surface \
  --sessionsfolder="${STUDY_FOLDER}/sessions" \
  --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
  --parelements=4 \
  --container="${QUNEX_CONTAINER}" \
  --scheduler="SLURM,time=00-12:00:00,cpus-per-task=2,mem-per-cpu=16G,jobname=qx_hcp_fmri_surface"

The above command woulda reserve a processing node with 2 CPUs and 16 GB of memory for 12 hours.