# Running lists of QuNex commands ````{admonition} Disclaimer This functionality is still in development and not yet fully tested and debugged. For the time being we recommend users to process their data with other QuNex approaches. Obviously, you are free to play around and test this, if you do so, please let us know of any bugs you encounter. ```` Often preprocessing and analysis of data in a study progresses through a number of steps using commands that are executed in a standard sequence and with a specific set of parameters. QuNex allows such sets of commands to be grouped in text files that contain named lists of commands. Each of these lists can combine multiple commands with common and/or specific parameters. This setup allows one to design a predefined set of steps that can be run with a single command. This allows careful design and explicitly documented processing and analysis of all steps in a study. The QuNex command that enables running lists of commands is `run_list`. It is invoked using the following call: ``` bash qunex run_list \ --listfile= \ --runlists= \ [--logfolder=None] \ [--verbose=no] \ [] ``` The `run_list` command takes two parameters: * `--listfile` - a text file that contains names lists of commands, * `--runlists` - comma separated names of the lists in the `listfile` file to run. This way the `run_list` command will execute all commands defined in each of the specified `runlists` in sequence. If multiple lists are specified `run_list` will execute them in the specified order. ## run_list parameters ### Core parameters run_list is executed using the following list of core parameters: * `--listfile` ... The listfile containing lists to run and their parameters. * `--runlists` ... A comma or pipe separated list of lists from the specified `listfile` to run. * `--logfolder` ... The folder within which to save the log. * `--verbose` ... Whether to record in a log a full verbose report of the output of each command that was run ('yes') or only a summary success report of each command that was run. ['no'] ### Parameter injection Inside the `listfile` you can define placeholder parameter labels which can then be dynamically injected from the command call or from the system environment. To do this, encapsulate a placeholder parameter value with curly braces: ``` bash qunex_parameter : {parameter_label} ``` Now we can set the `{parameter_label}` by using the `mapvalues` parameter of the `run_list` command: ``` bash qunex run_list \ ... --mapvalues="parameter_label:|parameter_label_2:" ``` We can also set the `{parameter_label}` via the OS environment variable `parameter_label`: ``` bash export parameter_label= # once the variable is set we can execute run_list qunex run_list ... ``` In both cases above the `{parameter_label}` in the `listfile` will be replaced with `` before the execution of the run_list. ### Parameters allowing parallel processing The following set of parameters allow spreading processing of multiple sessions across multiple parallel `run_list` invocations: * `--sessions` ... Either a string with pipe `|` or comma separated list of sessions (sessions ids) to be processed (use of grep patterns is possible), e.g. `"OP128,OP139,ER*"`, or a path to a batch.txt or a \*.list file with a list of session ids on which processing is to be run. * `--sperlist` ... An optional parameter specifying, how many sessions to run per individual run_list invocation. If not specified, all sessions will be run through the same run_list invocation. * `--runinpar` ... If multiple run_list invocations are to be run, how many should be run in parallel. The default is 1. * `--scheduler` ... An optional scheduler settings description string. If provided, each `run_list` invocation will be scheduled to run on a separate cluster node. For details about the settings string specification see the inline help for the `schedule` command. If these parameters are provided, the processing of the sessions will be split so that `sperlist` sessions will be processed by each separate run_list invocation. If `scheduler` is specified, each run_list invocation will be scheduled as a separate job on a cluster. When processing is spread across multiple run_list invocations, the `sperlist` parameter will be passed forward as `parsessions` parameter on each separate invocation (see the next section). Similarly `sessionids` will be passed on, adjusted for the sessions to be run with the specific run_list invocation (see the next section). Please take note that if `run_list` command is ran using a scheduler, any scheduler specification within the `listfile` will be ignored to avoid the attempts to spawn new cluster jobs when `run_list` instance is already running on a cluster node. Importantly, if `scheduler` is specified in the `runlist` file, do bear in mind, that all the commands in the list will be scheduled at the same time, and not in a succession, as `run_list` cannot track execution of jobs on individual cluster nodes. ### Parameters to pass on or ignore Sometimes the parameters specified in the `listfile` need to be adjusted in a run_list invocation. If the following parameters are listed, they will take precedence over parameters specified within the `listfile`: * `--parsessions` ... An optional parameter specifying how many sessions to run in parallel within a run_list invocation. If parsessions parameter is already specified within the `listfile`, then the lower value will take precedence. * `--parelements` ... An optional parameter specifying how many elements (e.g. bold images) to run in parallel within each of parallel jobs (their number) defined by `parsessions` parameter in a run_list invocation. If parelements parameter is already specified within the `listfile`, then the lower value will take precedence. Sometimes one would wish to ignore a parameter specified in a list. The parameters to ignore can be specified using: * `--ignore` ... An optional comma or pipe separated list of parameters to ignore when running any of the specified lists. ## Logging The log of the commands ran will be by default stored in `/processing/logs/runlogs` stamped with date and time that the log was started. If a study folder is not yet created, please provide a valid folder to save the logs to. If the log cannot be created the `run_list` command will exit with a failure. Individual commands that are run can generate their own logs, the presence and location of those logs depend on the specific command and settings specified in the `listfile`. ## Failures `run_list` is checking for a successful completion of commands that it runs. If any of the commands fail to complete successfully, the execution of the commands will stop and the failure will be reported both in stdout as well as the log. ## The listfile The commands to run and the parameters to use when running them are specified using a `listfile`. This section describes the format of the file. At the top of the `listfile` global settings are defined in the form of `: ` pairs. These are the settings that will be used as defaults throughout the list and individual commands defined in the rest of the `listfile`. Each list starts with a line that consists of three dashes "---" only. The next line should define the name of the list by specifying: `list: `. These list names are referenced in the `run_list` command via the `runlists` parameter. After the definition of the list, the default parameters for the list can be specified as a `:` pairs. These values will be taken as the default for the list. They have higher priority than parameter definitions located at the beginning of the `listfile`. This means that values defined within a specific list will override values defined at the beginning of the `listfile`. It is recommended for readability purposes for the content of the list to be indented by four (or two) spaces. Each list then consists of commands. Commands are defined by the `command: ` line. Each `command: ` specifies a command to be run, where `` is a valid QuNex command. The command within a list will be executed in the order they are listed. Each command can list additional parameters to be provided to the command in the form of `:` pairs. The values provided here have higher priority than parameter definitions located at the beginning of the `listfile` and at the beginning of each list. This means that values defined within a specific command will override values defined at the beginning of a specific list and at the beginning of the `listfile`. For readability purposes it is advised that the `:` pairs are further indented for additional four (or two) spaces. Parameter values specified in the command call itself (`qunex run_list`) have the highest priority and will override all parameter values set inside the `listfile`. If you do not want to use a parameter specified at a higher level when running a command (or list), you can prefix that parameter at the lower level with a dash/minus sign. For example, if you ran the QuNex command specifying `sessionids=OP394` then you can tell a particular command (or a list) inside the `listfile` to ignore that parameter by using (`-sessionids`) inside the `listfile`. ### Example Here is an example of a list file: ``` # global settings sessionsfolder : /data/testStudy/sessions overwrite : yes sessions : *_baseline --- list: dataImport command: import_bids inbox : /data/datalake/EMBARC/inbox/BIDS archive : leave --- list: prepareHCP command: create_session_info command: create_batch targetfile : /data/testStudy/processing/batch_baseline.txt command: setup_hcp --- list: doHCP sessions : /data/testStudy/processing/batch_baseline.txt parsessions : 4 command: hcp_pre_freesurfer command: hcp_freesurfer command: hcp_post_freesurfer command: hcp_fmri_volume parsessions : 1 parelements : 4 command: hcp_fmri_surface parsessions : 1 parelements : 4 --- list: prepareFCPreprocessing parsessions : 6 sessions : /data/testStudy/processing/batch_baseline.txt bolds : all command: map_hcp_data command: create_bold_brain_masks command: compute_bold_stats log : remove command : create_stats_report parsessions : 1 command: extract_nuisance_signal --- list: runFCPreprocessing parsessions : 6 sessions : /data/testStudy/processing/batch_baseline.txt scheduler : "SLURM,jobname=doHCP,time=00-02:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day" command: preprocess_bold bold_actions : shrc glm_residuals : save bold_nuisance : m,V,WM,WB,1d pignore : hipass=linear|regress=spline|lopass=linear overwrite : yes bolds : rest image_target : cifti hcp_cifti_tail : _Atlas --- list: doPreFS sessions : {{sessions_var}} parsessions : 4 command: hcp_pre_freesurfer ``` ## Examples Here are a few examples on running lists of commands: ``` bash qunex run_list \ --listfile="/data/settings/runlist.txt" \ --runlists="dataImport,prepareHCP" ``` ``` bash qunex run_list \ --listfile="/data/settings/runlist.txt" \ --runlists="doHCP" \ --batchfile="/data/testStudy/processing/batch_baseline.txt" \ --sperlist=4 \ --scheduler="SLURM,jobname=doHCP,time=04-00:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day" ``` ``` bash qunex run_list \ --listfile="/data/settings/runlist.txt" \ --runlists="prepareFCPreprocessing" \ --batchfile="/data/testStudy/processing/batch_baseline.txt" \ --sperlist=4 \ --scheduler="SLURM,jobname=doHCP,time=00-08:00:00,cpus-per-task=2,mem-per-cpu=20000,partition=day" ``` ``` bash qunex run_list --listfile="/data/settings/runlist.txt" \ --runlists="runFCPreprocessing" ``` ``` bash qunex run_list --listfile="/data/settings/runlist.txt" \ --runlists="doPreFS" \ --mapvalues="sessions_var:/data/testStudy/processing/batch_baseline.txt" ``` The first call will execute all the commands in lists `dataImport` and `prepareHCP` locally. The second call will execute all the steps of the HCP preprocessing pipeline, in sequence. Execution will be spread across the nodes with each `run_list` instance processing four sessions at a time. Based on the settings in the `listfile`, the first three HCP steps will be executed with four sessions running in parallel, whereas the last two fMRI steps the sessions will be executed serially with four BOLDS from each session being processed in parallel. The third call will again schedule multiple `run_list` invocations, each processing four sessions at a time (the lower number of `sperlist` and `parsessions`). In this call, the initial steps will be performed on all BOLD images. The fourth call will start a single `run_list` instance locally, however, this will submit both listed [`preprocess_bold`](../../api/gmri/preprocess_bold.rst) commands as jobs to be run with six sessions per node in parallel. These two commands will be run only on BOLD images tagged as `rest`. The last, fifth call will execute hcp_pre_freesurfer, the value of the `sessions` parameter here is set to a placeholder variable `sessions_var`, the value is then injected from the command call by using the `mapvalues` parameter. Alternatively the value could be injected by setting the environmental variable `$sessions_var`.