General information on running the HCP pipeline
Contents
General information on running the HCP pipeline#
This part of the guide will first present the general information on how to run the steps. For all HCP pipeline steps please refer to Running the HCP pipeline.
General HCP processing settings#
HCP processing mode#
HCP processing can be run in two modes, which are specified using the --hcp_processing_mode
parameter. The two modes are:
HCPStyleData
When the acquired data meets the requirements defined by the HCP, specifically, presence of high-resolution T1w and T2w (or FLAIR) structural images, presence of field map images for the processing of functional images, multiband functional images, diffusion images acquired with opposite phase encoding direction, then the processing can and should follow the steps as described in the Glasser et al. (2016) paper. In this case the
HCPStyleData
processing mode should be used.LegacyStyleData
When any of the HCP acquisition requirements are not met (e.g. lack of a high-resolution T2w image) or processing options incompatible with the HCP specification as described in the Glasser et al. (2016) paper are to be used (e.g. slice timing correction of single-band functional images), then the
LegacyStyleData
processing mode can be indicated to enable the extended options.
HCP folder structure#
QuNex supports two folder structures in organizing and naming input files, specified using the hcp_folderstructure
parameter. The two options are:
hcpya
In this case the folder structure used in the initial HCP Young Adults study is used. Specifically, the source files are stored in individual folders within the main
hcp
folder in parallel with the working folders and theMNINonLinear
folder with results. In addition, folders and files are specified usingfncb
andstrc
tags in the filename, for functional bold images and structural images respectively.hcpls
In this case the folder structure used in the HCP Life Span study is used. Specifically, the source files are all stored within their individual subfolders located in the joint
unprocessed
folder in the mainhcp
folder, parallel to the working folders and theMNINonLinear
folder. This is the default option used by QuNex.
HCP file naming#
QuNex supports two ways to name the source and the results files, which are defined using the hcp_filename
parameter. The two options are:
automated
In this case, all the image types are named automatically. T1w files are named
T1w_MPR
; T2w files are namedT2W_SPC
; GE Fieldmaps are namedFieldMap_GE
; magnitude and phase field map images are namedFieldMap_Magnitude
andFieldMap_Phase
, respectively; functional images and their reference images are namedBOLD_[N]
andBOLD_[N]_SBRef
, respectively; spin echo pairs are namedBOLD_<LR/RL/AP/PA>_SB_SE
; diffusion weighted images are namedDWI
. This is the default option used by QuNex.userdefined
In this case images are named using their user defined names, if they are provided in
session_hcp.txt
files and in thebatch.txt
file using thefilename
specification in the relevant sequence specification line, e.g.:20: bold3:EMOTION : tfMRI_EMOTION_PA : se(2) : phenc(PA) : EchoSpacing(0.0005800090) : filename(tfMRI_EMOTION_PA)
.
General information on running HCP preprocessing steps#
To enable efficient HCP preprocessing QuNex utilizes a processing engine. As noted before, to successfully run the HCP preprocessing steps, the following has to be accomplished first:
The files need to be mapped to the right folder structure.
All the information on sessions and their data need to be compiled into a batch file.
All the relevant parameters need to be compiled and specified either in a batch file or as command line arguments.
The command needs to be run with the right scheduler parameters or executed locally from the command line.
Steps for mapping the data and how to compile the batch.txt
file is described in the sections above. The relevant image parameters need to be added to the start of the batch.txt
file, either manually after the information has been compiled, or by writing them into a sessions/specs/parameters.txt
file to be automatically prepended when create_batch
command is used. In both cases the parameters are specified in the same manner (see batch file specification for general description), briefly, each parameter is added in a separate line as:
_<parameter name> : <parameter value>
Specific examples for each of the steps will be provided below. If passed in a command line, the format is:
qunex <command> \
--<parameter name>="<parameter value>" \
--<parameter name>="<parameter value>"
Do take care to put parameter values in double quotes if they include whitespace characters.
The last element to consider is the scheduler. The commands can be either run locally (via command line) or passed to cluster nodes using a batch scheduler. By default, commands are run locally. If a scheduler is to be used, then a scheduler settings string needs to be specified using the --scheduler
flag/parameter.
Running commands locally#
When a command is run locally, QuNex will use a pool of processors to run a number of sessions in parallel. How many sessions are run concurrently is specified using --parsessions
flag/parameter (the default value is 1). Specifically, if parsessions
is set to 5
, QuNex will start running processing of the first five sessions listed in the batch.txt
file concurrently. As soon as a session is processed and the relevant process is freed, the next session listed in the batch.txt
will start processing until either all the sessions in batch.txt
file have been processed or the number of sessions specified using --nprocess
parameter have been processed. --nprocess
is set to 0 by default, in which case it processes all the sessions listed in the batch.txt
file.
The commands hcp_fmri_volume
and hcp_fmri_surface
allow parallel processing of bold images for a particular session using the --parelements
flag/parameter (if there are several bold images that need processing for a session, and settings allow such processing, these images can be processed in parallel).
Running commands using a scheduler#
QuNex currently supports PBS and SLURM schedulers. For more specific information about the scheduler settings, please consult the relevant instructions for PBS (e.g. PBS User's Guide or qsub manual page) or SLURM (sbatch command). You can set this through the --scheduler
parmaeter, the parameter value starts with the name of the scheduler to use, followed by a comma separated list of arguments and their values:
--scheduler="<scheduler name>,<parameter>=<value>,<parameter>=<value>,<parameter>=<value>"
Specific scheduler string example for SLURM
is:
--scheduler="SLURM,jobname=hcp_pre_freesurfer,time=24:00:00,cpus-per-task=2,mem-per-cpu=1500,partition=day"
Also when running a command using the --scheduler
parameter, the parsessions
parameter specifies how many sessions will be scheduled to run on each node in parallel. If parsessions
is set to 5
, the QuNex scheduler engine will take the first five sessions listed in the batch.txt
file and submit a job to spawn itself on a node with those five sessions. It will then take the next five sessions specified in the batch.txt
file and submit another job to spawn itself on another node with these five sessions. And so on until either the list is exhausted or nprocess
sessions have been submitted to be processed. In this way, the scheduler functionality in QuNex allows for flexible and massively parallel runs of sessions across multiple nodes, while at the same time utilizing all the CPU cores per user specifications.
Just like with a local execution of commands hcp_fmri_volume
and hcp_fmri_surface
allow parallel processing of bold images for a particular session using the parelements
parameter/flag.
Completion testing#
Once the command ends, its success is validated by running a command completion test, i.e. testing for the presence of the last file that should be generated by the command. The completion check is always run.
Logging of HCP pipeline functions#
Multiple log files are created when a command is run. The progress of the command execution is both printed to the terminal as well as saved to a runlog file. This log will list the exact command that was run. In turn, it will list for each session the relevant information for the files and settings and report the success or failure of processing that session. Last, it will print a short summary of the success in processing each session, one session per line. By default runlogs are stored in processing/logs/runlogs
. They are given unique names compiled using the following specification:
Log-<command name>-<date>_<hour>.<minute>.<microsecond>.log
In addition to runlog files, detailed information about processing of each session is stored in comlog files. These files are generated each time a process is started. Comlog files are saved in processing/logs/comlogs
folder. When the command is started the files are named using the following specification:
tmp_<command_name>_<session code>_<date>_<hour>.<minute>.<microsecond>.log
If commands are run for individual files separately (e.g. each BOLD is run separately) then the specification is:
tmp_<command_name>_<file name>_<session code>_<date>_<hour>.<minute>.<microsecond>.log
After the command has completed, the outcome is reflected on the logfiles and depends on a) the level of checking indicated when running the command (the hcp_<short process name>_check
parameter described above), and b) the log
parameter. The possible results are:
error
The command specific test file (the last file that would be generated by the command) is not present, indicating that the command has not completed successfully. In this case the start of the log filename will be changed from
tmp
toerror
done
All the required tests have completed successfully. If
log
parameter is set to 'keep', the start of the log file will be renamed fromtmp
todone
. Iflog
parameter is set to 'remove', then the log file will be removed.incomplete
The test for the final file to be generated indicates that the command has completed, however, not all files specified in the file check list file have been found. In this case the start of the log filename will be changed from
tmp
toincomplete
.
It is advisable to always save comlog files. This can be achieved as a default setting by specifying the log
parameter in batch.txt
file. The log
parameter specifies whether to remove ('remove') the comlog files after
successful completion or to keep them in one or several locations ('keep', 'study', 'session', 'hcp').
One can also use logfolder
parameter to specify the location where the runlog and comlog files are to be saved if the desired location is other than the default. Note that if one runs a command within the processing
folder or its subfolders, the folder where the command is run is used as the logfolder
location.
Please see Logging and log files for more information.
Executing a test run#
Prior to running each of the above commands, the user can append the --test
flag / parameter, which will execute a 'dry' test run. If this parameter is set then each of the commands performs a number of tests to see whether the command might have already been run and successfully completed and all the required files are available. These tests are reported in runlogs and printed to the standard output on the terminal. Before running any of the above commands it is advisable to test-run the command (i.e. to do all the checks without actually running the command). As noted, this can be achieved by setting --run
parameter to --test
. If all the tested prerequisites are met, the session is reported ready to run both an individual session report as well as final report section of the runlog. Examples are provided in the Examples section of the hcp_pre_freesurfer
command reference.
Running multiple variants of HCP preprocessing#
Sometimes the user might want to run HCP pipelines with different settings and keep the results in parallel folders. To achieve this, use the hcp_suffix
parameter when setting up the data and when running the HCP minimal processing pipeline steps. If hcp_suffix
parameter is specified, the processing will be run on data in <session id>/hcp/<session id><hcp_suffix>
folder.