General information on running the HCP pipeline#

This part of the guide will first present the general information on how to run the steps. For all HCP pipeline steps please refer to Running the HCP pipeline.

General HCP processing settings#

HCP processing mode#

HCP processing can be run in two modes, which are specified using the --hcp_processing_mode parameter. The two modes are:

  • HCPStyleData

    When the acquired data meets the requirements defined by the HCP, specifically, presence of high-resolution T1w and T2w (or FLAIR) structural images, presence of field map images for the processing of functional images, multiband functional images, diffusion images acquired with opposite phase encoding direction, then the processing can and should follow the steps as described in the Glasser et al. (2016) paper. In this case the HCPStyleData processing mode should be used.

  • LegacyStyleData

    When any of the HCP acquisition requirements are not met (e.g. lack of a high-resolution T2w image) or processing options incompatible with the HCP specification as described in the Glasser et al. (2016) paper are to be used (e.g. slice timing correction of single-band functional images), then the LegacyStyleData processing mode can be indicated to enable the extended options.

HCP folder structure#

QuNex supports two folder structures in organizing and naming input files, specified using the hcp_folderstructure parameter. The two options are:

  • hcpya

    In this case the folder structure used in the initial HCP Young Adults study is used. Specifically, the source files are stored in individual folders within the main hcp folder in parallel with the working folders and the MNINonLinear folder with results. In addition, folders and files are specified using fncb and strc tags in the filename, for functional bold images and structural images respectively.

  • hcpls

    In this case the folder structure used in the HCP Life Span study is used. Specifically, the source files are all stored within their individual subfolders located in the joint unprocessed folder in the main hcp folder, parallel to the working folders and the MNINonLinear folder. This is the default option used by QuNex.

HCP file naming#

QuNex supports two ways to name the source and the results files, which are defined using the hcp_filename parameter. The two options are:

  • automated

    In this case, all the image types are named automatically. T1w files are named T1w_MPR; T2w files are named T2W_SPC; GE Fieldmaps are named FieldMap_GE; magnitude and phase field map images are named FieldMap_Magnitude and FieldMap_Phase, respectively; functional images and their reference images are named BOLD_[N] and BOLD_[N]_SBRef, respectively; spin echo pairs are named BOLD_<LR/RL/AP/PA>_SB_SE; diffusion weighted images are named DWI. This is the default option used by QuNex.

  • userdefined

    In this case images are named using their user defined names, if they are provided in session_hcp.txt files and in the batch.txt file using the filename specification in the relevant sequence specification line, e.g.: 20: bold3:EMOTION : tfMRI_EMOTION_PA : se(2) : phenc(PA) : EchoSpacing(0.0005800090) : filename(tfMRI_EMOTION_PA).

General information on running HCP preprocessing steps#

To enable efficient HCP preprocessing QuNex utilizes a processing engine. As noted before, to successfully run the HCP preprocessing steps, the following has to be accomplished first:

  1. The files need to be mapped to the right folder structure.

  2. All the information on sessions and their data need to be compiled into a batch file.

  3. All the relevant parameters need to be compiled and specified either in a batch file or as command line arguments.

  4. The command needs to be run with the right scheduler parameters or executed locally from the command line.

Steps for mapping the data and how to compile the batch.txt file is described in the sections above. The relevant image parameters need to be added to the start of the batch.txt file, either manually after the information has been compiled, or by writing them into a sessions/specs/parameters.txt file to be automatically prepended when create_batch command is used. In both cases the parameters are specified in the same manner (see batch file specification for general description), briefly, each parameter is added in a separate line as:

_<parameter name> : <parameter value>

Specific examples for each of the steps will be provided below. If passed in a command line, the format is:

qunex <command> \
    --<parameter name>="<parameter value>" \
    --<parameter name>="<parameter value>"

Do take care to put parameter values in double quotes if they include whitespace characters.

The last element to consider is the scheduler. The commands can be either run locally (via command line) or passed to cluster nodes using a batch scheduler. By default, commands are run locally. If a scheduler is to be used, then a scheduler settings string needs to be specified using the --scheduler flag/parameter.

Running commands locally#

When a command is run locally, QuNex will use a pool of processors to run a number of sessions in parallel. How many sessions are run concurrently is specified using --parsessions flag/parameter (the default value is 1). Specifically, if parsessions is set to 5, QuNex will start running processing of the first five sessions listed in the batch.txt file concurrently. As soon as a session is processed and the relevant process is freed, the next session listed in the batch.txt will start processing until either all the sessions in batch.txt file have been processed or the number of sessions specified using --nprocess parameter have been processed. --nprocess is set to 0 by default, in which case it processes all the sessions listed in the batch.txt file.

The commands hcp_fmri_volume and hcp_fmri_surface allow parallel processing of bold images for a particular session using the --parelements flag/parameter (if there are several bold images that need processing for a session, and settings allow such processing, these images can be processed in parallel).

Running commands using a scheduler#

QuNex currently supports PBS and SLURM schedulers. For more specific information about the scheduler settings, please consult the relevant instructions for PBS (e.g. PBS User's Guide or qsub manual page) or SLURM (sbatch command). Information on how to format the scheduler string to specify the desired options is easiest to access by running qunex schedule. Briefly, the string starts with the name of the scheduler to use, followed by a comma separated list of arguments and their values:

--scheduler="<scheduler name>,<parameter>=<value>,<parameter>=<value>,<parameter>=<value>"

Specific scheduler string example for SLURM is:

--scheduler="SLURM,jobname=hcp_pre_freesurfer,time=24:00:00,cpus-per-task=2,mem-per-cpu=1500,partition=day"

Also when running a command using the --scheduler parameter, the parsessions parameter specifies how many sessions will be scheduled to run on each node in parallel. If parsessions is set to 5, the QuNex scheduler engine will take the first five sessions listed in the batch.txt file and submit a job to spawn itself on a node with those five sessions. It will then take the next five sessions specified in the batch.txt file and submit another job to spawn itself on another node with these five sessions. And so on until either the list is exhausted or nprocess sessions have been submitted to be processed. In this way, the scheduler functionality in QuNex allows for flexible and massively parallel runs of sessions across multiple nodes, while at the same time utilizing all the CPU cores per user specifications.

Just like with a local execution of commands hcp_fmri_volume and hcp_fmri_surface allow parallel processing of bold images for a particular session using the parelements parameter/flag.

Completion testing#

Once the command ends, its success is validated by running a command completion test, i.e. testing for the presence of the last file that should be generated by the command. The completion check is always run.

Logging of HCP pipeline functions#

Multiple log files are created when a command is run. The progress of the command execution is both printed to the terminal as well as saved to a runlog file. This log will list the exact command that was run. In turn, it will list for each session the relevant information for the files and settings and report the success or failure of processing that session. Last, it will print a short summary of the success in processing each session, one session per line. By default runlogs are stored in processing/logs/runlogs. They are given unique names compiled using the following specification:

Log-<command name>-<date>_<hour>.<minute>.<microsecond>.log

In addition to runlog files, detailed information about processing of each session is stored in comlog files. These files are generated each time a process is started. Comlog files are saved in processing/logs/comlogs folder. When the command is started the files are named using the following specification:

tmp_<command_name>_<session code>_<date>_<hour>.<minute>.<microsecond>.log

If commands are run for individual files separately (e.g. each BOLD is run separately) then the specification is:

tmp_<command_name>_<file name>_<session code>_<date>_<hour>.<minute>.<microsecond>.log

After the command has completed, the outcome is reflected on the logfiles and depends on a) the level of checking indicated when running the command (the hcp_<short process name>_check parameter described above), and b) the log parameter. The possible results are:

  • error

    The command specific test file (the last file that would be generated by the command) is not present, indicating that the command has not completed successfully. In this case the start of the log filename will be changed from tmp to error

  • done

    All the required tests have completed successfully. If log parameter is set to 'keep', the start of the log file will be renamed from tmp to done. If log parameter is set to 'remove', then the log file will be removed.

  • incomplete

    The test for the final file to be generated indicates that the command has completed, however, not all files specified in the file check list file have been found. In this case the start of the log filename will be changed from tmp to incomplete.

It is advisable to always save comlog files. This can be achieved as a default setting by specifying the log parameter in batch.txt file. The log parameter specifies whether to remove ('remove') the comlog files after successful completion or to keep them in one or several locations ('keep', 'study', 'session', 'hcp').

One can also use logfolder parameter to specify the location where the runlog and comlog files are to be saved if the desired location is other than the default. Note that if one runs a command within the processing folder or its subfolders, the folder where the command is run is used as the logfolder location.

Please see Logging and log files for more information.

Executing a test run#

Prior to running each of the above commands, the user can append the --test flag / parameter, which will execute a 'dry' test run. If this parameter is set then each of the commands performs a number of tests to see whether the command might have already been run and successfully completed and all the required files are available. These tests are reported in runlogs and printed to the standard output on the terminal. Before running any of the above commands it is advisable to test-run the command (i.e. to do all the checks without actually running the command). As noted, this can be achieved by setting --run parameter to --test. If all the tested prerequisites are met, the session is reported ready to run both an individual session report as well as final report section of the runlog. Examples are provided in the Examples section of the hcp_pre_freesurfer command reference.

Running multiple variants of HCP preprocessing#

Sometimes the user might want to run HCP pipelines with different settings and keep the results in parallel folders. To achieve this, use the hcp_suffix parameter when setting up the data and when running the HCP minimal processing pipeline steps. If hcp_suffix parameter is specified, the processing will be run on data in <session id>/hcp/<session id><hcp_suffix> folder.