QuNex quick start using a Docker container#

Quick start on deploying the QuNex suite starting from raw data to launching HCP pipelines in under 30 minutes using a Docker container.

Requirements#

Software requirements:

Hardware requirements:

  • At least 8 GB RAM.

  • 20 GB storage space for imaging data (processed).

  • ~50 GB storage space for the container image.

Step 1: Getting access to the QuNex container registry#

If you do not have access to the QuNex container registry on https://gitlab.qunex.yale.edu/ then you first need to register for it at https://qunex.yale.edu/qunex-registration/.

Step 2: Download and prepare the QuNex container and the qunex_container script#

To start, open your console or terminal app. This quick start assumes that you will be working in the ${HOME}/qunex directory. Note here that the ${HOME} path is user dependent; for example, if my username is JohnDoe then my home path will be typically /home/JohnDoe. If you are not already in your home directory you should go there now and create a qunex subfolder where you will do your work:

# -- Go to your HOME FOLDER
cd $HOME

# -- Create the qunex subfolder
mkdir qunex

# -- Go into the newly created folder
cd qunex

# -- Login into the Docker repository for QuNex. Where you replace <username> with your username. You can get it, or change it at https://gitlab.qunex.yale.edu/-/profile/ account. 
docker login gitlab.qunex.yale.edu:5002 -u <username>

Next, you have to download the Docker container image from QuNex GitLab onto your machine. To do this execute:

# -- Pull the latest stable docker image
docker pull gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:<stable_container_tag>

We advise you to use the latest stable container tag. You can find it (along with older released tags) in the QuNex README file. For example:

# -- If the latest stable tag is 0.90.6 you would execute
docker login gitlab.qunex.yale.edu:5002 -u jdemsar
docker pull gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.90.6

Above instructions will pull the Docker container to your local machine. If you need the Singularity/Apptainer container, you should clone the https://gitlab.qunex.yale.edu/qunex/qunexcontainer git repository. In this case you will need to install git lfs (https://git-lfs.github.com/) so your Git will support large files. Note that cloning this repository will take some time as the Singularity image is 17GB large. When cloning large files Git does not show a progress bar, you can manually check the progress by inspecting the size of the folder you are cloning into. When done the Singularity container image will be in the local repository and the folder's size should be around 17GB. To clone the repository run:

git lfs clone https://gitlab.qunex.yale.edu/qunex/qunexcontainer.git

Once the QuNex Docker (or Singularity) container is downloaded you should download the qunex_container script. This script allows executing and scheduling QuNex commands via the previously downloaded container in a user friendly fashion. With the qunex_container you can execute QuNex commands the same way as they would if QuNex would be installed from source. The only difference is that instead of qunex you use the qunex_container command and provide the --container parameter which points to the container you want to use. To use the script we should add it to the PATH variable (you can also copy it into a folder that is already in PATH, e.g. /usr/bin) and make it executable:

# -- Download the script
wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1wdWgKvr67yX5J8pVUa6tBGXNAg3fssWs' -O qunex_container

# -- Add to path
PATH=${HOME}/qunex:${PATH}

# -- Make executable
chmod a+x ${HOME}/qunex/qunex_container

To test if the script is working type qunex_container into the console. If everything is OK, the script's help will be printed out.

Step 3: Download the example data#

Now that we can download the example data, we will put the data into the data subfolder inside our ${HOME}/qunex folder. Data is composed from three files:

  • The imaging data (HCPA001.zip) zip file contains the actual fMRI recordings.

  • The batch file (HCPA001_parameters.txt) contains a number of parameters to be used in preprocessing and analysis commands. These parameters are typically stable and do not change between various commands. For details see Batch files Wiki page.

  • The mapping specification file (HCPA001_mapping.txt) is an essential element of running QuNex and ensures that the raw nii data is onboarded (mapped to the defined HCP naming convention) correctly. For details see Mapping specification files Wiki page.

# -- Create data dir
mkdir data

# -- Go into the data dir
cd data

# -- Download imaging data
wget "https://docs.google.com/uc?export=download&id=1CbN9dtOQk3PwUeqnBdNeYmWizay2gSy7&confirm=t" -O HCPA001.zip

# -- Download the parameters file
wget --no-check-certificate -r 'https://drive.google.com/uc?id=16FePg7JoQo2jqWTYoI8-sZPEmPaCzZNd&export=download' -O HCPA001_parameters.txt

# -- Download the mapping specification file
wget --no-check-certificate -r 'https://drive.google.com/uc?id=1HtIm0IR7aQc8iJxf29JKW846VO_CnGUC&export=download' -O HCPA001_mapping.txt

If the data is properly prepared the commands below should give you the provided output.

# -- Check our location
pwd

# -- Output should look like this:
# ${HOME}/qunex/data

# -- Inspect the folder structure
tree

# -- Output should look like this:
# .
# ├── HCPA001_parameters.txt
# ├── HCPA001_mapping.txt
# └── HCPA001.zip

Step 4: Onboard the data and process HCP minimal processing pipeline through QuNex#

There are two options now:

  • Option A: You can onboard the data and process in "turnkey" style, via a single QuNex command. This option requires less work but also gives you less control as it will automatically continue with processing once one of the commands finishes.

  • Option B: You can execute every command as an independent step, this gives you more control as you can inspect the outputs of each step once it completes before continuing with the next step.

Option A: The run_turnkey command.#

Parameter preparation#

The code below sets and exports the parameters required for processing the example data. In this example we will facilitate the QuNex run_turnkey command which runs a list of specified command in a sequence (when a command in the list finishes successfully, QuNex will execute the next one).

# -- Set the name of the study
export STUDY_NAME="quickstart"

# -- Set your working directory
export WORK_DIR="${HOME}/qunex"

# -- Specify the container
# -- For Docker use the container name and tag:
export QUNEX_CONTAINER="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:[VERSION]"

# -- For Apptainer (Singularity) define an absolute path to the image
# export QUNEX_CONTAINER=${WORK_DIR}/container/qunex_suite-[VERSION].sif

# -- Location of previously prepared data
export RAW_DATA="${WORK_DIR}/data"

# -- Batch parameters file
export INPUT_BATCH_FILE="${RAW_DATA}/HCPA001_parameters.txt"

# -- Mapping file
export INPUT_MAPPING_FILE="${RAW_DATA}/HCPA001_mapping.txt"

# -- Sessions to run
export SESSIONS="HCPA001"

# -- You will run everything on the local file system as opposed to pulling data from a database (e.g. XNAT system)
export RUNTURNKEY_TYPE="local"

# -- List the processing steps (QuNex commands) you want to run
# -- The sequence below first prepares the data 
# -- and then executes the whole HCP minimal preprocessing pipeline
export RUNTURNKEY_STEPS="create_study,map_raw_data,import_dicom,create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer,hcp_fmri_volume,hcp_fmri_surface"

Note here that if your input data and files are not located in your home folder, container might not be able to access them. In order to overcome this, please consult the Binding/mapping external folders or setting additional container parameters section of the Running commands against a container using qunex_container Wiki page.

Command execution#

We are almost done, all we have to do now is execute what we prepared. We have to options to do this, we can just run the commands, or we can schedule the prepared execution. If you are not sure what do to here, you should probably use the option without scheduling. Scheduling is used in high performance computing environments and if you need it here, you should probably already know what scheduling is.

You can track the progress of processing inside logs in the study folder. If you used the parameter values provided in this quick start then logs will be in the ${HOME}/qunex/quickstart/processing/logs folder. Details about what logs are created and what you can find in them can be found at Logging. In principle each command (processing step) will create a runlog and a comlog. runlogs provide a more general overview of what is going on, while comlogs provide a detailed description of processing progress. If comlog is prefixed with tmp_ then that command is running, if it is prefixed with done_ the command finished successfully and if it is prefixed with error_ there was an error during processing.

For generated outputs, please consult the QuNex data hierarchy document (Data Hierarchy). For a detailed description of all used commands and their outputs you should consult the usage document of each command, you can find those at (List of all commands) under User guides.

Run the commands without a scheduler#

Now that all the parameters are prepared we can execute the run_turnkey command.

qunex_container run_turnkey \
    --rawdatainput="${RAW_DATA}" \
    --paramfile="${INPUT_BATCH_FILE}" \
    --mappingfile="${INPUT_MAPPING_FILE}" \
    --workingdir="${WORK_DIR}" \
    --projectname="${STUDY_NAME}" \
    --path="${WORK_DIR}/${STUDY_NAME}" \
    --sessions="${SESSIONS}" \
    --sessionsfoldername="sessions" \
    --turnkeytype="${RUNTURNKEY_TYPE}" \
    --container="${QUNEX_CONTAINER}" \
    --turnkeysteps="${RUNTURNKEY_STEPS}"
Schedule the commands#

Most of HPCs (high performance computing systems) do not allow running commands for a long time on the node that you login into. Instead, commands should be scheduled for execution. The qunex_container scripts allow easy scheduling via SLURM and PBS systems. Below is an example of how you can schedule the command from this example using SLURM. The example reserves a compute node for a single task on a single CPU with 16 GB memory for 4 days.

qunex_container run_turnkey \
    --rawdatainput="${RAW_DATA}" \
    --paramfile="${INPUT_BATCH_FILE}" \
    --mappingfile="${INPUT_MAPPING_FILE}" \
    --workingdir="${WORK_DIR}" \
    --projectname="${STUDY_NAME}" \
    --path="${WORK_DIR}/${STUDY_NAME}" \
    --sessions="${SESSIONS}" \
    --sessionsfoldername="sessions" \
    --turnkeytype="${RUNTURNKEY_TYPE}" \
    --container="${QUNEX_CONTAINER}" \
    --turnkeysteps="${RUNTURNKEY_STEPS}" \
    --scheduler="SLURM,time=04-00:00:00,cpus-per-task=1,mem-per-cpu=16000,jobname=qx_quickstart"

Option B: Step-by-step execution#

Parameter preparation#

First, we will prepare some parameters/variables that we will use throughout processing.

# -- Set the name of the study
export STUDY_FOLDER="${HOME}/qunex/quickstart"

# -- Specify the container
# -- For Docker use the container name and tag:
export QUNEX_CONTAINER="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:[VERSION]"

# -- For Apptainer (Singularity) define an absolute path to the image
# export QUNEX_CONTAINER=${WORK_DIR}/container/qunex_suite-[VERSION].sif

# -- Location of previously prepared data
export RAW_DATA="${HOME}/qunex/data"

# -- Batch parameters file
export INPUT_BATCH_FILE="${RAW_DATA}/HCPA001_parameters.txt"

# -- Mapping file
export INPUT_MAPPING_FILE="${RAW_DATA}/HCPA001_mapping.txt"

# -- Sessions to run
export SESSIONS="HCPA001"

Execution of commands#

Below is the chain of commands that achieves the same as the run_turnkey command above. Here, we execute each step independently, giving us more control over the whole process. Similarly as in the run_turnkey example, you can add use the scheduler parameter for execution on HPCs.

First, we need to create the QuNex study folder structure.

qunex_container create_study \
    --studyfolder="${STUDY_FOLDER}" \
    --container="${QUNEX_CONTAINER}"

Next, we need to onboard our data. Here we specify the location of sessions in the study folder, sessions we are onboarding, the location of the raw data (masterinbox). To leave the raw_data archive alone, we set archive parameter to leave.

# onboard the data, the masterinbox parameter defines 
qunex_container import_dicom \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --sessions="${SESSIONS}" \
    --masterinbox="${RAW_DATA}" \
    --archive="leave" \
    --container="${QUNEX_CONTAINER}"

After the data is onboarded, we need need prepare the folder structure and extract all information required for HCP pipelines. Finally, we prepare a QuNex batch file, this file stores values of processing parameters as well as information about all sessions. Batch files are then used extensively by most QuNex commands to assure that the same parameters are used throughout all commands and that the same imaging data gets used everywhere. This is done through the set of commands below.

# create a session info file with all relevant data for HCP
qunex_container create_session_info \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --sessions="${SESSIONS}" \
    --mapping="${INPUT_MAPPING_FILE}" \
    --container="${QUNEX_CONTAINER}"

# map the session from QuNex to HCP folder structure
qunex_container setup_hcp \
    --sourcefolder="${STUDY_FOLDER}/sessions/${SESSIONS}" \
    --container="${QUNEX_CONTAINER}"

# create the QuNex batch file
qunex_container create_batch \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --sessions="${SESSIONS}" \
    --targetfile="${STUDY_FOLDER}/processing/batch.txt" \
    --paramfile="${INPUT_BATCH_FILE}" \
    --container="${QUNEX_CONTAINER}"

Once this is completed, we can start executing the HCP minimal processing pipeline in a step-by-step fashion.

# hcp_pre_freesurfer
qunex_container hcp_pre_freesurfer \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --container="${QUNEX_CONTAINER}"

# hcp_freesurfer
qunex_container hcp_freesurfer \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --container="${QUNEX_CONTAINER}"

# hcp_post_freesurfer
qunex_container hcp_post_freesurfer \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --container="${QUNEX_CONTAINER}"

# hcp_fmri_volume
# note the parelements parameter which makes QuNex process all 4 BOLDs in parallel
qunex_container hcp_fmri_volume\
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --parelements=4 \
    --container="${QUNEX_CONTAINER}"

# hcp_fmri_surface
qunex_container hcp_fmri_surface\
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --parelements=4 \
    --container="${QUNEX_CONTAINER}"