# QuNex quick start using a Docker container Quick start on deploying the QuNex suite starting from raw data to launching HCP pipelines in under 30 minutes using a Docker container. ## Requirements Software requirements: * A machine with a Unix based OS. * Python installed [https://www.python.org/downloads/](https://www.python.org/downloads/). * Docker installed [https://docs.docker.com/get-docker/](https://docs.docker.com/get-docker/). * Access to the QuNex container registry on [https://gitlab.qunex.yale.edu/](https://gitlab.qunex.yale.edu/), you can get the access via [https://qunex.yale.edu/qunex-registration/](https://qunex.yale.edu/qunex-registration/). Hardware requirements: * At least 8 GB RAM. * 20 GB storage space for imaging data (processed). * ~50 GB storage space for the container image. ## Step 1: Getting access to the QuNex container registry If you do not have access to the QuNex container registry on [https://gitlab.qunex.yale.edu/](https://gitlab.qunex.yale.edu/) then you first need to register for it at [https://qunex.yale.edu/qunex-registration/](https://qunex.yale.edu/qunex-registration/). ## Step 2: Download and prepare the QuNex container and the `qunex_container` script To start, open your console or terminal app. This quick start assumes that you will be working in the `${HOME}/qunex` directory. Note here that the `${HOME}` path is user dependent; for example, if my username is `JohnDoe` then my home path will be typically `/home/JohnDoe`. If you are not already in your home directory you should go there now and create a `qunex` subfolder where you will do your work: ``` bash # -- Go to your HOME FOLDER cd $HOME # -- Create the qunex subfolder mkdir qunex # -- Go into the newly created folder cd qunex # -- Login into the Docker repository for QuNex. Where you replace with your username. You can get it, or change it at https://gitlab.qunex.yale.edu/-/profile/ account. docker login gitlab.qunex.yale.edu:5002 -u ``` Next, you have to download the Docker container image from QuNex GitLab onto your machine. To do this execute: ``` bash # -- Pull the latest stable docker image docker pull gitlab.qunex.yale.edu:5002/qunex/qunexcontainer: ``` We advise you to use the latest stable container tag. You can find it (along with older released tags) in the QuNex `README` file. For example: ``` bash # -- If the latest stable tag is 0.90.6 you would execute docker login gitlab.qunex.yale.edu:5002 -u jdemsar docker pull gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.90.6 ``` Above instructions will pull the Docker container to your local machine. If you need the Singularity/Apptainer container, you should clone the https://gitlab.qunex.yale.edu/qunex/qunexcontainer git repository. In this case you will need to install `git lfs` (https://git-lfs.github.com/) so your Git will support large files. Note that cloning this repository will take some time as the Singularity image is 17GB large. When cloning large files Git does not show a progress bar, you can manually check the progress by inspecting the size of the folder you are cloning into. When done the Singularity container image will be in the local repository and the folder's size should be around 17GB. To clone the repository run: ```shell git lfs clone https://gitlab.qunex.yale.edu/qunex/qunexcontainer.git ``` Once the QuNex Docker (or Singularity) container is downloaded you should download the `qunex_container` script. This script allows executing and scheduling QuNex commands via the previously downloaded container in a user friendly fashion. With the `qunex_container` you can execute QuNex commands the same way as they would if QuNex would be installed from source. The only difference is that instead of `qunex` you use the `qunex_container` command and provide the `--container` parameter which points to the container you want to use. To use the script we should add it to the `PATH` variable (you can also copy it into a folder that is already in `PATH`, e.g. `/usr/bin`) and make it executable: ``` bash # -- Download the script wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1wdWgKvr67yX5J8pVUa6tBGXNAg3fssWs' -O qunex_container # -- Add to path PATH=${HOME}/qunex:${PATH} # -- Make executable chmod a+x ${HOME}/qunex/qunex_container ``` To test if the script is working type `qunex_container` into the console. If everything is OK, the script's help will be printed out. ## Step 3: Download the example data Now that we can download the example data, we will put the data into the `data` subfolder inside our `${HOME}/qunex` folder. Data is composed from three files: * The imaging data (`HCPA001.zip`) zip file contains the actual fMRI recordings. * The batch file (`HCPA001_parameters.txt`) contains a number of parameters to be used in preprocessing and analysis commands. These parameters are typically stable and do not change between various commands. For details see [Batch files](../Overview/file_batch_txt) Wiki page. * The mapping specification file (`HCPA001_mapping.txt`) is an essential element of running QuNex and ensures that the raw `nii` data is onboarded (mapped to the defined HCP naming convention) correctly. For details see [Mapping specification files](../Overview/file_mapping) Wiki page. ``` bash # -- Create data dir mkdir data # -- Go into the data dir cd data # -- Download imaging data wget "https://docs.google.com/uc?export=download&id=1CbN9dtOQk3PwUeqnBdNeYmWizay2gSy7&confirm=t" -O HCPA001.zip # -- Download the parameters file wget --no-check-certificate -r 'https://drive.google.com/uc?id=16FePg7JoQo2jqWTYoI8-sZPEmPaCzZNd&export=download' -O HCPA001_parameters.txt # -- Download the mapping specification file wget --no-check-certificate -r 'https://drive.google.com/uc?id=1HtIm0IR7aQc8iJxf29JKW846VO_CnGUC&export=download' -O HCPA001_mapping.txt ``` If the data is properly prepared the commands below should give you the provided output. ``` bash # -- Check our location pwd # -- Output should look like this: # ${HOME}/qunex/data # -- Inspect the folder structure tree # -- Output should look like this: # . # ├── HCPA001_parameters.txt # ├── HCPA001_mapping.txt # └── HCPA001.zip ``` ## Step 4: Onboard the data and process HCP minimal processing pipeline through QuNex There are two options now: - Option A: You can onboard the data and process in "turnkey" style, via a single QuNex command. This option requires less work but also gives you less control as it will automatically continue with processing once one of the commands finishes. - Option B: You can execute every command as an independent step, this gives you more control as you can inspect the outputs of each step once it completes before continuing with the next step. ### Option A: The `run_turnkey` command. #### Parameter preparation The code below sets and exports the parameters required for processing the example data. In this example we will facilitate the QuNex [`run_turnkey`](../../api/gmri/run_turnkey.rst) command which runs a list of specified command in a sequence (when a command in the list finishes successfully, QuNex will execute the next one). ``` bash # -- Set the name of the study export STUDY_NAME="quickstart" # -- Set your working directory export WORK_DIR="${HOME}/qunex" # -- Specify the container # -- For Docker use the container name and tag: export QUNEX_CONTAINER="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:[VERSION]" # -- For Apptainer (Singularity) define an absolute path to the image # export QUNEX_CONTAINER=${WORK_DIR}/container/qunex_suite-[VERSION].sif # -- Location of previously prepared data export RAW_DATA="${WORK_DIR}/data" # -- Batch parameters file export INPUT_BATCH_FILE="${RAW_DATA}/HCPA001_parameters.txt" # -- Mapping file export INPUT_MAPPING_FILE="${RAW_DATA}/HCPA001_mapping.txt" # -- Sessions to run export SESSIONS="HCPA001" # -- You will run everything on the local file system as opposed to pulling data from a database (e.g. XNAT system) export RUNTURNKEY_TYPE="local" # -- List the processing steps (QuNex commands) you want to run # -- The sequence below first prepares the data # -- and then executes the whole HCP minimal preprocessing pipeline export RUNTURNKEY_STEPS="create_study,map_raw_data,import_dicom,create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer,hcp_fmri_volume,hcp_fmri_surface" ``` Note here that if your input data and files are not located in your home folder, container might not be able to access them. In order to overcome this, please consult the Binding/mapping external folders or setting additional container parameters section of the [Running commands against a container using qunex_container](../UsageDocs/Running-Containerized-QuNex-with-qunex_container) Wiki page. #### Command execution We are almost done, all we have to do now is execute what we prepared. We have to options to do this, we can just run the commands, or we can schedule the prepared execution. If you are not sure what do to here, you should probably use the option without scheduling. Scheduling is used in high performance computing environments and if you need it here, you should probably already know what scheduling is. You can track the progress of processing inside logs in the study folder. If you used the parameter values provided in this quick start then logs will be in the `${HOME}/qunex/quickstart/processing/logs` folder. Details about what logs are created and what you can find in them can be found at [Logging](../Overview/Logging). In principle each command (processing step) will create a `runlog` and a `comlog`. `runlogs` provide a more general overview of what is going on, while `comlogs` provide a detailed description of processing progress. If `comlog` is prefixed with `tmp_` then that command is running, if it is prefixed with `done_` the command finished successfully and if it is prefixed with `error_` there was an error during processing. For generated outputs, please consult the QuNex data hierarchy document ([Data Hierarchy](../Overview/DataHierarchy)). For a detailed description of all used commands and their outputs you should consult the usage document of each command, you can find those at ([List of all commands](../../api/gmri.rst)) under User guides. ##### Run the commands without a scheduler Now that all the parameters are prepared we can execute the `run_turnkey` command. ``` bash qunex_container run_turnkey \ --rawdatainput="${RAW_DATA}" \ --paramfile="${INPUT_BATCH_FILE}" \ --mappingfile="${INPUT_MAPPING_FILE}" \ --workingdir="${WORK_DIR}" \ --projectname="${STUDY_NAME}" \ --path="${WORK_DIR}/${STUDY_NAME}" \ --sessions="${SESSIONS}" \ --sessionsfoldername="sessions" \ --turnkeytype="${RUNTURNKEY_TYPE}" \ --container="${QUNEX_CONTAINER}" \ --turnkeysteps="${RUNTURNKEY_STEPS}" ``` ##### Schedule the commands Most of HPCs (high performance computing systems) do not allow running commands for a long time on the node that you login into. Instead, commands should be scheduled for execution. The `qunex_container` scripts allow easy scheduling via SLURM and PBS systems. Below is an example of how you can schedule the command from this example using SLURM. The example reserves a compute node for a single task on a single CPU with 16 GB memory for 4 days. ``` bash qunex_container run_turnkey \ --rawdatainput="${RAW_DATA}" \ --paramfile="${INPUT_BATCH_FILE}" \ --mappingfile="${INPUT_MAPPING_FILE}" \ --workingdir="${WORK_DIR}" \ --projectname="${STUDY_NAME}" \ --path="${WORK_DIR}/${STUDY_NAME}" \ --sessions="${SESSIONS}" \ --sessionsfoldername="sessions" \ --turnkeytype="${RUNTURNKEY_TYPE}" \ --container="${QUNEX_CONTAINER}" \ --turnkeysteps="${RUNTURNKEY_STEPS}" \ --scheduler="SLURM,time=04-00:00:00,cpus-per-task=1,mem-per-cpu=16000,jobname=qx_quickstart" ``` ### Option B: Step-by-step execution #### Parameter preparation First, we will prepare some parameters/variables that we will use throughout processing. ``` bash # -- Set the name of the study export STUDY_FOLDER="${HOME}/qunex/quickstart" # -- Specify the container # -- For Docker use the container name and tag: export QUNEX_CONTAINER="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:[VERSION]" # -- For Apptainer (Singularity) define an absolute path to the image # export QUNEX_CONTAINER=${WORK_DIR}/container/qunex_suite-[VERSION].sif # -- Location of previously prepared data export RAW_DATA="${HOME}/qunex/data" # -- Batch parameters file export INPUT_BATCH_FILE="${RAW_DATA}/HCPA001_parameters.txt" # -- Mapping file export INPUT_MAPPING_FILE="${RAW_DATA}/HCPA001_mapping.txt" # -- Sessions to run export SESSIONS="HCPA001" ``` #### Execution of commands Below is the chain of commands that achieves the same as the `run_turnkey` command above. Here, we execute each step independently, giving us more control over the whole process. Similarly as in the `run_turnkey` example, you can add use the `scheduler` parameter for execution on HPCs. First, we need to create the QuNex study folder structure. ``` bash qunex_container create_study \ --studyfolder="${STUDY_FOLDER}" \ --container="${QUNEX_CONTAINER}" ``` Next, we need to onboard our data. Here we specify the location of sessions in the study folder, sessions we are onboarding, the location of the raw data (`masterinbox`). To leave the raw_data archive alone, we set `archive` parameter to `leave`. ``` bash # onboard the data, the masterinbox parameter defines qunex_container import_dicom \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --sessions="${SESSIONS}" \ --masterinbox="${RAW_DATA}" \ --archive="leave" \ --container="${QUNEX_CONTAINER}" ``` After the data is onboarded, we need need prepare the folder structure and extract all information required for HCP pipelines. Finally, we prepare a QuNex batch file, this file stores values of processing parameters as well as information about all sessions. Batch files are then used extensively by most QuNex commands to assure that the same parameters are used throughout all commands and that the same imaging data gets used everywhere. This is done through the set of commands below. ``` bash # create a session info file with all relevant data for HCP qunex_container create_session_info \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --sessions="${SESSIONS}" \ --mapping="${INPUT_MAPPING_FILE}" \ --container="${QUNEX_CONTAINER}" # map the session from QuNex to HCP folder structure qunex_container setup_hcp \ --sourcefolder="${STUDY_FOLDER}/sessions/${SESSIONS}" \ --container="${QUNEX_CONTAINER}" # create the QuNex batch file qunex_container create_batch \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --sessions="${SESSIONS}" \ --targetfile="${STUDY_FOLDER}/processing/batch.txt" \ --paramfile="${INPUT_BATCH_FILE}" \ --container="${QUNEX_CONTAINER}" ``` Once this is completed, we can start executing the HCP minimal processing pipeline in a step-by-step fashion. ``` bash # hcp_pre_freesurfer qunex_container hcp_pre_freesurfer \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --batchfile="${STUDY_FOLDER}/processing/batch.txt" \ --container="${QUNEX_CONTAINER}" # hcp_freesurfer qunex_container hcp_freesurfer \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --batchfile="${STUDY_FOLDER}/processing/batch.txt" \ --container="${QUNEX_CONTAINER}" # hcp_post_freesurfer qunex_container hcp_post_freesurfer \ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --batchfile="${STUDY_FOLDER}/processing/batch.txt" \ --container="${QUNEX_CONTAINER}" # hcp_fmri_volume # note the parelements parameter which makes QuNex process all 4 BOLDs in parallel qunex_container hcp_fmri_volume\ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --batchfile="${STUDY_FOLDER}/processing/batch.txt" \ --parelements=4 \ --container="${QUNEX_CONTAINER}" # hcp_fmri_surface qunex_container hcp_fmri_surface\ --sessionsfolder="${STUDY_FOLDER}/sessions" \ --batchfile="${STUDY_FOLDER}/processing/batch.txt" \ --parelements=4 \ --container="${QUNEX_CONTAINER}" ```