Running lists of QuNex commands#

Disclaimer

This functionality is still in development and not yet fully tested and debugged. For the time being we recommend users to process their data with other QuNex approaches. Obviously, you are free to play around and test this, if you do so, please let us know of any bugs you encounter.

Often preprocessing and analysis of data in a study progresses through a number of steps using commands that are executed in a standard sequence and with a specific set of parameters. QuNex allows such sets of commands to be grouped in text files that contain named lists of commands. Each of these lists can combine multiple commands with common and/or specific parameters. This setup allows one to design a predefined set of steps that can be run with a single command. This allows careful design and explicitly documented processing and analysis of all steps in a study.

The QuNex command that enables running lists of commands is run_list. It is invoked using the following call:

qunex run_list \
    --listfile=<path to the file with lists> \
    --runlists=<names of the lists to run> \
    [--logfolder=None] \
    [--verbose=no] \
    [<extra arguments>]

The run_list command takes two parameters:

  • --listfile - a text file that contains names lists of commands,

  • --runlists - comma separated names of the lists in the listfile file to run.

This way the run_list command will execute all commands defined in each of the specified runlists in sequence. If multiple lists are specified run_list will execute them in the specified order.

run_list parameters#

Core parameters#

run_list is executed using the following list of core parameters:

  • --listfile

    ... The listfile containing lists to run and their parameters.

  • --runlists

    ... A comma or pipe separated list of lists from the specified listfile to run.

  • --logfolder

    ... The folder within which to save the log.

  • --verbose

    ... Whether to record in a log a full verbose report of the output of each command that was run ('yes') or only a summary success report of each command that was run. ['no']

Parameter injection#

Inside the listfile you can define placeholder parameter labels which can then be dynamically injected from the command call or from the system environment. To do this, encapsulate a placeholder parameter value with curly braces:

qunex_parameter : {parameter_label}

Now we can set the {parameter_label} by using the mapvalues parameter of the run_list command:

qunex run_list \
  ...
  --mapvalues="parameter_label:<some_value>|parameter_label_2:<another_value>"

We can also set the {parameter_label} via the OS environment variable parameter_label:

export parameter_label=<some_value>

# once the variable is set we can execute run_list
qunex run_list
  ...

In both cases above the {parameter_label} in the listfile will be replaced with <some_value> before the execution of the run_list.

Parameters allowing parallel processing#

The following set of parameters allow spreading processing of multiple sessions across multiple parallel run_list invocations:

  • --sessions

    ... Either a string with pipe | or comma separated list of sessions (sessions ids) to be processed (use of grep patterns is possible), e.g. "OP128,OP139,ER*", or a path to a batch.txt or a *.list file with a list of session ids on which processing is to be run.

  • --sperlist

    ... An optional parameter specifying, how many sessions to run per individual run_list invocation. If not specified, all sessions will be run through the same run_list invocation.

  • --runinpar

    ... If multiple run_list invocations are to be run, how many should be run in parallel. The default is 1.

  • --scheduler

    ... An optional scheduler settings description string. If provided, each run_list invocation will be scheduled to run on a separate cluster node. For details about the settings string specification see the inline help for the schedule command.

If these parameters are provided, the processing of the sessions will be split so that sperlist sessions will be processed by each separate run_list invocation. If scheduler is specified, each run_list invocation will be scheduled as a separate job on a cluster.

When processing is spread across multiple run_list invocations, the sperlist parameter will be passed forward as parsessions parameter on each separate invocation (see the next section). Similarly sessionids will be passed on, adjusted for the sessions to be run with the specific run_list invocation (see the next section).

Please take note that if run_list command is ran using a scheduler, any scheduler specification within the listfile will be ignored to avoid the attempts to spawn new cluster jobs when run_list instance is already running on a cluster node.

Importantly, if scheduler is specified in the runlist file, do bear in mind, that all the commands in the list will be scheduled at the same time, and not in a succession, as run_list cannot track execution of jobs on individual cluster nodes.

Parameters to pass on or ignore#

Sometimes the parameters specified in the listfile need to be adjusted in a run_list invocation. If the following parameters are listed, they will take precedence over parameters specified within the listfile:

  • --parsessions

    ... An optional parameter specifying how many sessions to run in parallel within a run_list invocation. If parsessions parameter is already specified within the listfile, then the lower value will take precedence.

  • --parelements

    ... An optional parameter specifying how many elements (e.g. bold images) to run in parallel within each of parallel jobs (their number) defined by parsessions parameter in a run_list invocation. If parelements parameter is already specified within the listfile, then the lower value will take precedence.

Sometimes one would wish to ignore a parameter specified in a list. The parameters to ignore can be specified using:

  • --ignore

    ... An optional comma or pipe separated list of parameters to ignore when running any of the specified lists.

Logging#

The log of the commands ran will be by default stored in <study>/processing/logs/runlogs stamped with date and time that the log was started. If a study folder is not yet created, please provide a valid folder to save the logs to. If the log cannot be created the run_list command will exit with a failure.

Individual commands that are run can generate their own logs, the presence and location of those logs depend on the specific command and settings specified in the listfile.

Failures#

run_list is checking for a successful completion of commands that it runs. If any of the commands fail to complete successfully, the execution of the commands will stop and the failure will be reported both in stdout as well as the log.

The listfile#

The commands to run and the parameters to use when running them are specified using a listfile. This section describes the format of the file.

At the top of the listfile global settings are defined in the form of <parameter>: <value> pairs. These are the settings that will be used as defaults throughout the list and individual commands defined in the rest of the listfile.

Each list starts with a line that consists of three dashes "---" only. The next line should define the name of the list by specifying: list: <listname>. These list names are referenced in the run_list command via the runlists parameter. After the definition of the list, the default parameters for the list can be specified as a <parameter>:<value> pairs. These values will be taken as the default for the list. They have higher priority than parameter definitions located at the beginning of the listfile. This means that values defined within a specific list will override values defined at the beginning of the listfile. It is recommended for readability purposes for the content of the list to be indented by four (or two) spaces.

Each list then consists of commands. Commands are defined by the command: <command name> line. Each command: <command name> specifies a command to be run, where <command name> is a valid QuNex command. The command within a list will be executed in the order they are listed.

Each command can list additional parameters to be provided to the command in the form of <parameter>:<value> pairs. The values provided here have higher priority than parameter definitions located at the beginning of the listfile and at the beginning of each list. This means that values defined within a specific command will override values defined at the beginning of a specific list and at the beginning of the listfile. For readability purposes it is advised that the <parameter>:<value> pairs are further indented for additional four (or two) spaces.

Parameter values specified in the command call itself (qunex run_list) have the highest priority and will override all parameter values set inside the listfile.

If you do not want to use a parameter specified at a higher level when running a command (or list), you can prefix that parameter at the lower level with a dash/minus sign. For example, if you ran the QuNex command specifying sessionids=OP394 then you can tell a particular command (or a list) inside the listfile to ignore that parameter by using (-sessionids) inside the listfile.

Example#

Here is an example of a list file:

# global settings
sessionsfolder : /data/testStudy/sessions
overwrite      : yes
sessions       : *_baseline


---
list: dataImport

    command: import_bids
        inbox   : /data/datalake/EMBARC/inbox/BIDS
        archive : leave

---
list: prepareHCP

    command: create_session_info

    command: create_batch
        targetfile : /data/testStudy/processing/batch_baseline.txt

    command: setup_hcp

---
list: doHCP

    sessions     : /data/testStudy/processing/batch_baseline.txt
    parsessions : 4

    command: hcp_pre_freesurfer

    command: hcp_freesurfer

    command: hcp_post_freesurfer

    command: hcp_fmri_volume
        parsessions : 1
        parelements : 4

    command: hcp_fmri_surface
        parsessions : 1
        parelements  : 4

---
list: prepareFCPreprocessing
    parsessions : 6
    sessions     : /data/testStudy/processing/batch_baseline.txt
    bolds        : all

    command: map_hcp_data

    command: create_bold_brain_masks

    command: compute_bold_stats
        log : remove

    command : create_stats_report
        parsessions : 1

    command: extract_nuisance_signal

---
list: runFCPreprocessing

    parsessions : 6
    sessions     : /data/testStudy/processing/batch_baseline.txt
    scheduler    : "SLURM,jobname=doHCP,time=00-02:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"

    command: preprocess_bold
        bold_actions     : shrc
        glm_residuals    : save
        bold_nuisance    : m,V,WM,WB,1d
        pignore          : hipass=linear|regress=spline|lopass=linear
        overwrite        : yes
        bolds            : rest
        image_target     : cifti
        hcp_cifti_tail   : _Atlas

---
list: doPreFS
    sessions     : {{sessions_var}}
    parsessions : 4

    command: hcp_pre_freesurfer

Examples#

Here are a few examples on running lists of commands:

qunex run_list \
    --listfile="/data/settings/runlist.txt" \
    --runlists="dataImport,prepareHCP"
qunex run_list \
    --listfile="/data/settings/runlist.txt" \
    --runlists="doHCP" \
    --batchfile="/data/testStudy/processing/batch_baseline.txt" \
    --sperlist=4 \
    --scheduler="SLURM,jobname=doHCP,time=04-00:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"
qunex run_list \
    --listfile="/data/settings/runlist.txt" \
    --runlists="prepareFCPreprocessing" \
    --batchfile="/data/testStudy/processing/batch_baseline.txt" \
    --sperlist=4 \
    --scheduler="SLURM,jobname=doHCP,time=00-08:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day"
qunex run_list
    --listfile="/data/settings/runlist.txt" \
    --runlists="runFCPreprocessing"
qunex run_list
    --listfile="/data/settings/runlist.txt" \
    --runlists="doPreFS" \
    --mapvalues="sessions_var:/data/testStudy/processing/batch_baseline.txt"

The first call will execute all the commands in lists dataImport and prepareHCP locally.

The second call will execute all the steps of the HCP preprocessing pipeline, in sequence. Execution will be spread across the nodes with each run_list instance processing four sessions at a time. Based on the settings in the listfile, the first three HCP steps will be executed with four sessions running in parallel, whereas the last two fMRI steps the sessions will be executed serially with four BOLDS from each session being processed in parallel.

The third call will again schedule multiple run_list invocations, each processing four sessions at a time (the lower number of sperlist and parsessions). In this call, the initial steps will be performed on all BOLD images.

The fourth call will start a single run_list instance locally, however, this will submit both listed preprocess_bold commands as jobs to be run with six sessions per node in parallel. These two commands will be run only on BOLD images tagged as rest.

The last, fifth call will execute hcp_pre_freesurfer, the value of the sessions parameter here is set to a placeholder variable sessions_var, the value is then injected from the command call by using the mapvalues parameter. Alternatively the value could be injected by setting the environmental variable $sessions_var.