Running lists of QuNex commands
Contents
Running lists of QuNex commands#
Disclaimer
This functionality is still in development and not yet fully tested and debugged. For the time being we recommend users to process their data with other QuNex approaches. Obviously, you are free to play around and test this, if you do so, please let us know of any bugs you encounter.
Often preprocessing and analysis of data in a study progresses through a number of steps using commands that are executed in a standard sequence and with a specific set of parameters. QuNex allows such sets of commands to be grouped in text files that contain named lists of commands. Each of these lists can combine multiple commands with common and/or specific parameters. This setup allows one to design a predefined set of steps that can be run with a single command. This allows careful design and explicitly documented processing and analysis of all steps in a study.
The QuNex command that enables running lists of commands is run_list
. It is invoked using the following call:
qunex run_list \
--listfile=<path to the file with lists> \
--runlists=<names of the lists to run> \
[--logfolder=None] \
[--verbose=no] \
[<extra arguments>]
The run_list
command takes two parameters:
--listfile
- a text file that contains names lists of commands,--runlists
- comma separated names of the lists in thelistfile
file to run.
This way the run_list
command will execute all commands defined in each of the specified runlists
in sequence. If multiple lists are specified run_list
will execute them in the specified order.
run_list parameters#
Core parameters#
run_list is executed using the following list of core parameters:
--listfile
... The listfile containing lists to run and their parameters.
--runlists
... A comma or pipe separated list of lists from the specified
listfile
to run.--logfolder
... The folder within which to save the log.
--verbose
... Whether to record in a log a full verbose report of the output of each command that was run ('yes') or only a summary success report of each command that was run. ['no']
Parameter injection#
Inside the listfile
you can define placeholder parameter labels which can then be dynamically injected from the command call or from the system environment. To do this, encapsulate a placeholder parameter value with curly braces:
qunex_parameter : {parameter_label}
Now we can set the {parameter_label}
by using the mapvalues
parameter of the run_list
command:
qunex run_list \
...
--mapvalues="parameter_label:<some_value>|parameter_label_2:<another_value>"
We can also set the {parameter_label}
via the OS environment variable parameter_label
:
export parameter_label=<some_value>
# once the variable is set we can execute run_list
qunex run_list
...
In both cases above the {parameter_label}
in the listfile
will be replaced with <some_value>
before the execution of the run_list.
Parameters allowing parallel processing#
The following set of parameters allow spreading processing of multiple sessions across multiple parallel run_list
invocations:
--sessions
... Either a string with pipe
|
or comma separated list of sessions (sessions ids) to be processed (use of grep patterns is possible), e.g."OP128,OP139,ER*"
, or a path to a batch.txt or a *.list file with a list of session ids on which processing is to be run.--sperlist
... An optional parameter specifying, how many sessions to run per individual run_list invocation. If not specified, all sessions will be run through the same run_list invocation.
--runinpar
... If multiple run_list invocations are to be run, how many should be run in parallel. The default is 1.
--scheduler
... An optional scheduler settings description string. If provided, each
run_list
invocation will be scheduled to run on a separate cluster node. For details about the settings string specification see the inline help for theschedule
command.
If these parameters are provided, the processing of the sessions will be split so that sperlist
sessions will be processed by each separate run_list invocation. If scheduler
is specified, each run_list invocation will be scheduled as a separate job on a cluster.
When processing is spread across multiple run_list invocations, the sperlist
parameter will be passed forward as parsessions
parameter on each separate invocation (see the next section). Similarly sessionids
will be passed on, adjusted for the sessions to be run with the specific run_list invocation (see the next section).
Please take note that if run_list
command is ran using a scheduler, any scheduler specification within the listfile
will be ignored to avoid the attempts to spawn new cluster jobs when run_list
instance is already running on a cluster node.
Importantly, if scheduler
is specified in the runlist
file, do bear in mind, that all the commands in the list will be scheduled at the same time, and not in a succession, as run_list
cannot track execution of jobs on individual cluster nodes.
Parameters to pass on or ignore#
Sometimes the parameters specified in the listfile
need to be adjusted in a run_list invocation. If the following parameters are listed, they will take precedence over parameters specified within the listfile
:
--parsessions
... An optional parameter specifying how many sessions to run in parallel within a run_list invocation. If parsessions parameter is already specified within the
listfile
, then the lower value will take precedence.--parelements
... An optional parameter specifying how many elements (e.g. bold images) to run in parallel within each of parallel jobs (their number) defined by
parsessions
parameter in a run_list invocation. If parelements parameter is already specified within thelistfile
, then the lower value will take precedence.
Sometimes one would wish to ignore a parameter specified in a list. The parameters to ignore can be specified using:
--ignore
... An optional comma or pipe separated list of parameters to ignore when running any of the specified lists.
Logging#
The log of the commands ran will be by default stored in <study>/processing/logs/runlogs
stamped with date and time that the log was started. If a study folder is not yet created, please provide a valid folder to save the logs to. If the log cannot be created the run_list
command will exit with a failure.
Individual commands that are run can generate their own logs, the presence and location of those logs depend on the specific command and settings specified in the listfile
.
Failures#
run_list
is checking for a successful completion of commands that it runs. If any of the commands fail to complete successfully, the execution of the commands will stop and the failure will be reported both in stdout as well as the log.
The listfile#
The commands to run and the parameters to use when running them are specified using a listfile
. This section describes the format of the file.
At the top of the listfile
global settings are defined in the form of <parameter>: <value>
pairs. These are the settings that will be used as defaults throughout the list and individual commands defined in the rest of the listfile
.
Each list starts with a line that consists of three dashes "---" only. The next line should define the name of the list by specifying: list: <listname>
. These list names are referenced in the run_list
command via the runlists
parameter. After the definition of the list, the default parameters for the list can be specified as a <parameter>:<value>
pairs. These values will be taken as the default for the list. They have higher priority than parameter definitions located at the beginning of the listfile
. This means that values defined within a specific list will override values defined at the beginning of the listfile
. It is recommended for readability purposes for the content of the list to be indented by four (or two) spaces.
Each list then consists of commands. Commands are defined by the command: <command name>
line. Each command: <command name>
specifies a command to be run, where <command name>
is a valid QuNex command. The command within a list will be executed in the order they are listed.
Each command can list additional parameters to be provided to the command in the form of <parameter>:<value>
pairs. The values provided here have higher priority than parameter definitions located at the beginning of the listfile
and at the beginning of each list. This means that values defined within a specific command will override values defined at the beginning of a specific list and at the beginning of the listfile
. For readability purposes it is advised that the <parameter>:<value>
pairs are further indented for additional four (or two) spaces.
Parameter values specified in the command call itself (qunex run_list
) have the highest priority and will override all parameter values set inside the listfile
.
If you do not want to use a parameter specified at a higher level when running a command (or list), you can prefix that parameter at the lower level with a dash/minus sign. For example, if you ran the QuNex command specifying sessionids=OP394
then you can tell a particular command (or a list) inside the listfile
to ignore that parameter by using (-sessionids
) inside the listfile
.
Example#
Here is an example of a list file:
# global settings
sessionsfolder : /data/testStudy/sessions
overwrite : yes
sessions : *_baseline
---
list: dataImport
command: import_bids
inbox : /data/datalake/EMBARC/inbox/BIDS
archive : leave
---
list: prepareHCP
command: create_session_info
command: create_batch
targetfile : /data/testStudy/processing/batch_baseline.txt
command: setup_hcp
---
list: doHCP
sessions : /data/testStudy/processing/batch_baseline.txt
parsessions : 4
command: hcp_pre_freesurfer
command: hcp_freesurfer
command: hcp_post_freesurfer
command: hcp_fmri_volume
parsessions : 1
parelements : 4
command: hcp_fmri_surface
parsessions : 1
parelements : 4
---
list: prepareFCPreprocessing
parsessions : 6
sessions : /data/testStudy/processing/batch_baseline.txt
bolds : all
command: map_hcp_data
command: create_bold_brain_masks
command: compute_bold_stats
log : remove
command : create_stats_report
parsessions : 1
command: extract_nuisance_signal
---
list: runFCPreprocessing
parsessions : 6
sessions : /data/testStudy/processing/batch_baseline.txt
scheduler : "SLURM,jobname=doHCP,time=00-02:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"
command: preprocess_bold
bold_actions : shrc
glm_residuals : save
bold_nuisance : m,V,WM,WB,1d
pignore : hipass=linear|regress=spline|lopass=linear
overwrite : yes
bolds : rest
image_target : cifti
hcp_cifti_tail : _Atlas
---
list: doPreFS
sessions : {{sessions_var}}
parsessions : 4
command: hcp_pre_freesurfer
Examples#
Here are a few examples on running lists of commands:
qunex run_list \
--listfile="/data/settings/runlist.txt" \
--runlists="dataImport,prepareHCP"
qunex run_list \
--listfile="/data/settings/runlist.txt" \
--runlists="doHCP" \
--batchfile="/data/testStudy/processing/batch_baseline.txt" \
--sperlist=4 \
--scheduler="SLURM,jobname=doHCP,time=04-00:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"
qunex run_list \
--listfile="/data/settings/runlist.txt" \
--runlists="prepareFCPreprocessing" \
--batchfile="/data/testStudy/processing/batch_baseline.txt" \
--sperlist=4 \
--scheduler="SLURM,jobname=doHCP,time=00-08:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"
qunex run_list
--listfile="/data/settings/runlist.txt" \
--runlists="runFCPreprocessing"
qunex run_list
--listfile="/data/settings/runlist.txt" \
--runlists="doPreFS" \
--mapvalues="sessions_var:/data/testStudy/processing/batch_baseline.txt"
The first call will execute all the commands in lists dataImport
and prepareHCP
locally.
The second call will execute all the steps of the HCP preprocessing pipeline, in sequence. Execution will be spread across the nodes with each run_list
instance processing four sessions at a time. Based on the settings in the listfile
, the first three HCP steps will be executed with four
sessions running in parallel, whereas the last two fMRI steps the sessions will be executed serially with four BOLDS from each session being processed in parallel.
The third call will again schedule multiple run_list
invocations, each processing four sessions at a time (the lower number of sperlist
and parsessions
). In this call, the initial steps will be performed on all BOLD images.
The fourth call will start a single run_list
instance locally, however, this will submit both listed preprocess_bold
commands as jobs to be run with six sessions per node in parallel. These two commands will be run only on BOLD images tagged as rest
.
The last, fifth call will execute hcp_pre_freesurfer, the value of the sessions
parameter here is set to a placeholder variable sessions_var
, the value is then injected from the command call by using the mapvalues
parameter. Alternatively the value could be injected by setting the environmental variable $sessions_var
.