# Running 1st and 2nd level BOLD task activation analyses

Task activation analysis in QuNex uses a standard general linear model (GLM) based approach. Notably, QuNex leverages the same GLM 'engine' when computing regression for functional connectivity preprocessing described in the section on [BOLD task-evoked and resting-state functional connectivity preprocessing](../UsageDocs/BOLDFunctionalConnectivity).

We recommend that the initial steps of creating BOLD image brain masks, computing image statistics, and creating image statistics reports described above are performed before conducting task-based BOLD activation analyses. One reason is that they allow inspection of the quality of data and identification of sessions that might need to be excluded due to excessive movement. Another reason is that you might want to exclude bad frames from GLM computation and/or regress movement parameters to improve results.

This is a short overview of steps and commands that are available for BOLD task-based activation analysis. It is divided into two parts.

1. The first part relates to first level single-session analysis in which one obtains session-specific GLM estimates of task-evoked responses.
2. The second part relates to second level group analysis in which specific hypotheses are tested at the group level.

## First level (single session) analysis

### Creating conc and fidl files

Prior the first level analysis, two pieces of information need to be prepared:

1. Which BOLD images and in what order they were acquired during task performance. This information is provided using `.conc` files.
2. The second piece of information is the exact timing and duration of events that are to be
modeled in the task GLM analysis. This information is provided in `.fidl` files.

`.conc` and `.fidl` files have to be created manually and for each session separately. For the specific format of both files see the specification of [fidl](../Overview/file_fidl) and [conc](../Overview/file_conc) files.

In short, `.conc` files should contain a list of BOLD image files that are to be included in the
analysis in the order in which they are to be concatenated.

#### Creating conc and fidl files - examples

Example of a `.conc` file:

``` bash
number_of_files: 4
    file:/Volumes/tigr/MBLab/fMRI/sWM/sessions/OP108/images/functional/bold1_Atlas.dtseries.nii
    file:/Volumes/tigr/MBLab/fMRI/sWM/sessions/OP108/images/functional/bold2_Atlas.dtseries.nii
    file:/Volumes/tigr/MBLab/fMRI/sWM/sessions/OP108/images/functional/bold3_Atlas.dtseries.nii
    file:/Volumes/tigr/MBLab/fMRI/sWM/sessions/OP108/images/functional/bold4_Atlas.dtseries.nii
```

This is a `.conc` file for one session. The file gives information that four BOLD images were acquired during task performance and wished to be used in the analysis. Moreover, complete directories of BOLD images are provided after the `file:` notation and they are written in the order in which they were acquired.

`.fidl` files start with the information on TR with which the BOLD images were acquired, and the list of events that are to be modeled. Each subsequent line then specifies the time, event type index, and event duration for each of the events to be modeled. Following event duration one can also specify the value of additional behavioral (or other) coregressors. Note, it is essential that the sequence of BOLD image files in `.conc` files matches the timing information in `.fidl` files. Specifically, the timing information in `.fidl` files is given as if task BOLD images would represent a single uninterrupted scan.

---

Example of a `.fidl` file:

``` bash
2.5  encoding  delay  response
12.5  0  2.5
15.5  1  9.5
25    2  3
40    0  2.5
43    1  9.5
52.5  2  3
70    0  2.5
73    1  9.5
82.5  2  3
...
```

This is a snippet of a `.fidl` file for one session. The first line of the file gives information on the TR (i.e. 2.5) and the names of all events (i.e. 'encoding', 'delay', 'response'). Following lines in the file are divided into three columns, where the first column gives information on the start of the event (in seconds and aligned with TR), the second column represents the code of the event (the index of the event from the events list in the first line), and the third column gives the duration of the event (in seconds).

To make use of `.conc` and `.fidl` files it is best to put them in `<study>/sessions/inbox/events` folder, where they would be found by the processing commands and copied to the individual's `<study>/sessions/<session id>/images/functional/concs` and `<study>/sessions/<session_id>/images/functional/events` folders for `.conc` and `.fidl` files, respectively. To identify correctly which file belongs to which participant or session, the following naming convention needs to be used:

* `.conc` files: `<session_id>_<task name>.conc`
* `.fidl` files: `<session_id>_<task name>_<modeling name>.fidl`

### Running GLM analysis

The first step in task-evoked BOLD analysis is a single-session or first level statistical analysis. This is most commonly accomplished with regression analysis using the general linear modeling (GLM) framework. GLM is used to obtain activation (beta) estimates while (optionally) also accounting for nuisance signals (e.g. motion, white matter signal, ventricle signal etc.). Before computing GLMs, additional preprocessing steps can also be performed, such as spatial smoothing and temporal filtering.

These analysis steps can be done with either [`preprocess_bold`](../../api/gmri/preprocess_bold.rst) or [`preprocess_conc`](../../api/gmri/preprocess_bold.rst) QuNex commands. `preprocess_bold` can be used when task spanned only a single BOLD image. More commonly though, tasks spans a series of BOLD images specified in a `.conc` file in which case `preprocess_conc` should be used. Do note that as tasks usually span multiple BOLD images, existing functionality is enhanced and novel functionality added to `preprocess_conc` command first. Going forward the two commands might be combined into a single one.

`preprocess_conc` is a complex command that enables multiple preprocessing steps, in general GLM computation, removal of nuisance signal, spatial smoothing, temporal filtering (high-pass, low-pass) and motion scrubbing. Detailed information on the command inputs and options should be checked in the command help documentation and on [BOLD Preprocessing](../UsageDocs/BOLDPreprocessing) page.

The command produces several different outputs as listed below:

* residual image:

    `<root>_res-<regressors><glm name>.<ext>`

* GLM image:

    `<bold name><bold tail>_conc_<event root>_res-<regressors><glm name>_Bcoeff.<ext>`.

* text GLM regressor matrix:

    `<bold name><bold tail>_GLM-X_<event root>_res-<regressors><glm name>.txt`.

* image of a regressor matrix:

    `<bold name><bold tail>_GLM-X_<event root>_res-<regressors><glm name>.png`.

While residual and GLM images are stored in `<study folder>/sessions/<session id>/images/functional`, text files and images of a GLM regressor matrix can be found in `<study folder>/sessions/<session id>/images/functional/glm`.

#### Running GLM analysis - example

Example use of `preprocess_conc`:

``` bash
qunex preprocess_conc \
    --batchfile="batch.txt" \
    --sessionsfolder="sessions" \
    --overwrite="yes" \
    --parsessions="6" \
    --bold_actions="sr" \
    --bold_nuisance="e" \
    --glm_name="_A" \
    --glm_results="all" \
    --event_string="encoding:boynton|delay:boynton|response:boynton" \
    --bold_preprocess="sWM" \
    --event_file="sWM_A" \
    --nprocess="0" \
    --image_target=cifti \
    --hcp_cifti_tail="_Atlas" \
    --conc_use="absolute"
```

The breakdown of the parameters is as follows:

* `sessions`
    Path to batch file

* `sessionsfolder`
    Path to study sessions folder

* `overwrite`
    Whether to overwrite the existing files (yes) or not (no)

* `parsessions`
    How many sessions to process in parallel

* `bold_actions`
    Which actions to perform, e.g. 'src' for smoothing and regression

* `bold_nuisance`
    What nuisance signal to model, at least 'e' for event modeling

* `glm_name`
    The name of the resulting GLM file

* `glm_results`
    Which results from GLM regression to save, e.g. 'cr' for beta coef. and GLM residuals

* `event_string`
    A string of regressor to be used in the GLM modeling

* `bold_preprocess`
    The root name of the conc files

* `event_file`
    The root name of the fidl files

* `nprocess`
    How many sessions to process, e.g. 0 for all

* `image_target`
    What file format to process

* `hcp_cifti_tail`
    The tail of the cifti files

* `conc_use`
    'relative' (extract and use only bold number), 'absolute' (use whole path)

### Preparing the event string parameter

In this section we will cover event string in more detail. The event string directly relates to events as they are reported in fidl files. Each of the event types listed in the fidl file can be modeled using the Boynton HRF, SPM double gaussian HRF, block response, unassumed modeling, or a combination of these.

#### Assumed HRF modeling

Assumed modeling is specified by fidl event code, type of hemodynamic response function (HRF), and optionally length of event in the format: `<fidl code>:<hrf function>[-run|-uni][:length]`. Here are a few examples:

* `target:boynton` ... models all target events using the Boynton HRF with duration as specified in the fidl file
* `response:spm:2` ... models all response events using SPM double gaussian HRF with assumed event duration of 2 second (irrespective of what is specified in fidl file)

For each specified event in the string a single regressor will be created by creating a time vector of zeros, placing 1 for all the timepoints of the duration of the event and convolving it with the specified HRF function.

The optional `-run` or `-uni` extension of the hrf command specifies, how the resulting regressor time series should be normalized. In case of `<hrf function>-run`, e.g. `boynton-run` or `SPM-run`, the resulting regressor is normalized to an amplitude of 1 within each bold run separately. This can result in different scaling of regressors with different durations, and of the same regressor across different runs. Scaling, in this case, is performed after the signal is weighted, so in effect, the scaling of weights (e.g. behavioral regressors), can differ across bold runs. In the case of `<hrf function>-uni`, e.g. 'boynton-uni' or 'SPM-uni', the HRF function will be normalized to have the area under the curve equal to 1. This ensures uniform and universal, scaling of the resulting regressor across all event lengths. In addition, the scaling is not impacted by weights (e.g. behavioral coregressors), which in turn ensures that the weights are not scaled. The flag can be abbreviated to '-r' and '-u'. If not specified, '-uni' will be assumed. The default has changed from `-run` in QuNex version 0.93.4.

The optional length parameter, if set, would override the event duration information provided in the fidl file.

#### Block modeling

Blocked modeling is specified by fidl event code followed by key word `block` and two numbers that specify block onset and offset as duration in frames from the event onset and offset as specified in the fidl file. For example:

* `movement:block:1:2` ... would create a block regressor with onset 1 frame after the start of the movement event as recorded in fidl file and offset 2 frames after the end of the event according to the information in the fidl file

#### Unassumed HRF modeling

In comparison to assumed and blocked modeling in which a single regressor is created for the specific event type of interest, unassumed HRF modeling enables one to estimate the response time course across the specified number of frames. Specifically, for each event type, N regressors are created (N equals the number of frames over which to model the event). All the regressors are set initially to zero, then the first regressor is set to 1 for all first frames of the specified event of interest, the second regressor is set to 1 for all the second frames of the specified event of interest, and so on. For example, let's assume that we would like to model event T with an unassumed regressor of length 5. In the event string, we would specify that using the format:

`<fidl event code>[:u]:<number of frames to model>` as `T:u:5` or just `T:5`.

We can represent the resulting regressors in the following timeline, with `Event` coding the onsets of events A, B and T, and lines T1 to T5 coding the actual values of the regressors for each of the frames:

``` bash
Event:  A . . . T . . T . . . . B . . A . . T . . . B . . T . . . . 
   T1:  0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
   T2:  0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
   T3:  0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 
   T4:  0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0
   T5:  0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
```

#### Naming of regressors

Regressors by default take the name from the fidl event code to which they relate; this name can, however, be changed. It is especially prudent to name regressors if a regressor relates to more than one fidl code or if a behavioral (co)regressor is to be created. In the first case, let's assume that you would like to create a single Boynton assumed regressor for both events A and B. This can be specified as: `A,B:boynton>AB`. To parse that string, `A,B` specifies that events A and B are to be treated a single event type `boynton` specifies to model them using Boynton HRF, `>` denotes further information, of which the first, in this case `AB` is the name of the created regressor.

#### Behavioral regressors

Sometimes one needs to create a behavioral regressor, a regressor whose amplitude would depend on a specific behavioral variable, e.g. accuracy or reaction time. The values for behavioral regressors first need to be encoded in the `.fidl` file in the form of additional columns for each specified event. For instance:

``` bash
2.0 congruent incongruent
6.0   0  1  342 
10.0  1  1  425
14.0  1  1  462
20.0  0  1  367
22.0  1  1  418
26.0  0  1  332
...
```

This `.fidl` file specifies in the header a BOLD event naming with TR of 2 seconds in which congruent and incongruent trials were shown. The rest of the lines contain in the first column the time of the trial onset, in the second column the code of the event (0 referring to congruent, 1 to incongruent), in the third column the event duration is specified, and finally, the fourth column denotes the reaction time of the response.

If we would like to investigate, the activity of which brain region correlates with variability in trial to trial task performance, we can generate the following two behavioral regressors:

* `congruent:9>congruent_rt:1:within:z`
* `incongruent:9>incongruent_rt:1:within:z`

The first specification would create 9 regressors to model via an unassumed HRF method for congruent trials. The set of regressors would be named `congruent_rt` and their value would be based on the additional (4th) column in the fidl file (the one with the reaction times in the example above). To create the regressors, the reaction times would be standardized to z-scores within the congruent events and all the values 1 across the 9 unassumed regressors that pertain to a specific event would be multiplied by the z-score value for that event. The second set of unassumed regressors would be created in the same way for the incongruent trials.

Do note, that since the behavioral regressor(s) would be de-meaned (i.e. mean removed) within each event type, the estimates for other regressors would not change by adding these behavioral regressors. The latter would only help establish activation of which brain regions correlate with reaction times, in this case for each time frame separately, as we are using unassumed HRF modeling.

A full formal specification for a regressor is:

```<fidl code>:<hrf>:<length>[><name>[:<column>[:<normalization span>[:<normalization method>]]]]```

For a behavioral regressor `normalization span` can be set to:

* `within` ... the behavioral values are only normalized within the specified event type
* `across` ... the behavioral values are normalized across all events.

Normalization method can be set to:

* `z`  ... compute a Z-score
* `01` ... normalize to fixed range 0 to 1
* `-11` ... normalize to fixed range -1 to 1

#### Full event string

All the regressors that are to be specified are then joined in a single string and separated using the pipe `|` character. For example:

``` bash
event_string="congruent:boynton|incongruent:boynton|congruent:boynton>congruent_rt:1:within:z|incongruent:boynton>incongruent_rt:1:within:z"
```

The string specifies a model with 2 assumed regressors modeling response to congruent and incongruent trials using Boynton HRF, and 2 additional behavioral regressors that are based on z-scored values from the first extra column in the fidl file.

## Second level (group) analysis

### Whole-brain second-level statistical analysis

The second step in task-evoked BOLD analysis is a group- or second-level statistical analysis. In this step activation estimates obtained in the previous step are subjected to statistical testing to address questions such as, which regions were activated and deactivated during task performance, what were differences in (de)activation between two conditions or two groups, and others.

Whole-brain second-level analysis is broadly organized into three steps:

1. The relevant effect estimates are extracted from the individual GLM files and combined into group files.
2. The desired statistics are computed. QuNex has adopted the excellent FSL [PALM](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM) package that uses permutation resampling methods of complex uni- and multivariate GLM models for statistical inference. QuNex provides convenience functions that simplify generation of input files, and running analyses.
3. The obtained results can be thresholded, masked, and combined into final results.

As steps two and three are shared with other types of whole-brain analyses, please refer to [Group-level thresholding and mapping of results](../UsageDocs/GroupLevelStats) for more information. Again, more detailed information can be obtained by consulting each respective command help documentation and other related user guide documents.

### Creating list files and extracting GLM estimates

Before running the second-level analysis in PALM, we need to extract GLM estimates of the effects of interests, organize and save them in the desired manner. This can be achieved with the `qunex` command [`general_extract_glm_volumes`](../../api/gmri/general_extract_glm_volumes.rst).

The command uses the `.list` file as an input. A `.list` file should contain information on the `session id` for each session included in the analysis, followed by the directory to the file with GLM estimates for that session. Detailed description of list files is provided in the specification of [list](../Overview/file_list) files.

### Examples

Example of a `.list` file:

``` bash
session id: OP109
    glm: /Volumes/tigr/MBLab/fMRI/sWM/sessions/OP109/images/functional/bold_Atlas_conc_sWM_A_s_res-e_A_Bcoeff.dtseries.nii
session id: OP110
    glm: /Volumes/tigr/MBLab/fMRI/sWM/sessions/OP110/images/functional/bold_Atlas_conc_sWM_A_s_res-e_A_Bcoeff.dtseries.nii
session id: OP136
    glm: /Volumes/tigr/MBLab/fMRI/sWM/sessions/OP136/images/functional/bold_Atlas_conc_sWM_A_s_res-e_A_Bcoeff.dtseries.nii
session id: OP137
    glm: /Volumes/tigr/MBLab/fMRI/sWM/sessions/OP137/images/functional/bold_Atlas_conc_sWM_A_s_res-e_A_Bcoeff.dtseries.nii
session id: OP140
    glm: /Volumes/tigr/MBLab/fMRI/sWM/sessions/OP140/images/functional/bold_Atlas_conc_sWM_A_s_res-e_A_Bcoeff.dtseries.nii
```

This is a `.list` file for a group of five sessions. The file gives information on each session's id and the directory to the file with GLM estimates after a `glm:` notation.

To extract the estimates of interest use:

* [`general_extract_glm_volumes`](../../api/gmri/general_extract_glm_volumes.rst)

    This command, which can be run either from MATLAB or the command line, extracts GLM estimates of the effects of interests, organizes and saves them in the desired manner.

---

Example use of `general_extract_glm_volumes`:

``` bash
qunex general_extract_glm_volumes \
    --flist=sWM_A_n25.list \
    --outf=sWM_A_n25 \
    --effects="encoding, delay, response" \
    --saveoption="by subject" \
    --verbose=true
```

### Running PALM

QuNex uses a wrapper command [`run_palm`](../../api/gmri/run_palm.rst) to run second-level statistical analysis in PALM. PALM allows various types of statistical tests, which are defined by different input parameters and should be carefully checked in the [PALM User Guide](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM/UsageDocs#Using_PALM).

PALM requires several inputs:

1. Statistical testing is performed on the file with extracted group GLM estimates prepared before.
2. Separate specification files have to be created with information on the design matrix, t-contrasts, f-contrasts and exchangeability blocks respectively. These specification files can be easily created with the [`create_ws_palm_design`](../../api/gmri/create_ws_palm_design.rst) convenience command:

* [`create_ws_palm_design`](../../api/gmri/create_ws_palm_design.rst)

  This command prepares a design file, t-contrasts, f-contrasts and exchangeability block files for a single group within-session PALM designs. It supports full factorial designs with up to four factors with arbitrary number of levels each. The command assumes the data to be organized by session and the first specified factor to be the slowest varying one. It is recommended that the output files are carefully examined to ensure that information matches the data and that the design is appropriate for the type of second-level statistical analysis that you want to perform.

More details on design matrices, t-contrasts, f-contrasts and exchangeability blocks that are appropriate for specific statistical analyses (e.g. independent t-test, paired t-test, one-way ANOVA etc.) can be found on [FSL wiki](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM).

### PALM examples

Example use of `create_ws_palm_design`:

``` bash
qunex create_ws_palm_design \
    --factors="2,3" \
    --nsubjects=25 \
    --root=sWM_A_n25
```

The example above would create four `.csv` files, separately for the design matrix, t-contrasts, f-contrasts and exchangeability blocks, named according to the parameter `root`. The example uses two factors to prepare specification files, the first with two levels and the second with three levels. The design matrix would then consist of numbers (usually ones and zeros) that code for different main effects of factors and their interactions. It would also contain coding for subjects as random effects, in the case above 25 subjects would be added in the design matrix. t-contrasts and f-contrasts would be prepared in the order: main effect of the first factor, main effect of the second factor, interaction between factors, subject effects. Exchangeability blocks file in the case above would contain one column with sequential numbers (starting from 1), where each number would code for one specific subject.

Once all specification files are prepared, second-level statistical testings can be run in PALM with the following command:

* [`run_palm`](../../api/gmri/run_palm.rst)

 This command runs second level analysis using PALM permutation resampling. The command can be run on grayordinate, CIFTI images or volume, NIFTI images. The output of this command is a number of different image files that contain results for all t- or f-contrasts provided by the `--design` parameter and different methods for multiple testing correction defined by the `palm_args` parameter.

---

Example use of `run_palm`:

``` bash
qunex run_palm \
    --image=sWM_A_n25.dtseries.nii \
    --design="name:sWM_A_n25|f:none" \
    --palm_args="n:500|fdr|T|T2DHEC:2:0.5:26|C:3.1|accel:tail" \
    --root=PALM/sWM_A_n25
```

#### Masking and joining maps

Finally, output results of PALM can be thresholded or masked with the `mask_map` command and combined with the `join_maps` command. `mask_map` and `join_maps` are both convenience wrapper commands that utilize `wb_command` to modify CIFTI files.

* `mask_map`

  This command enables easy masking of CIFTI images (e.g. ztstat image from PALM), using the provided list of mask files (e.g. p-values images from PALM) and thresholds. More than one mask can be used in which case they can be combined using a logical OR or AND operator.

* `join_maps`

  This command concatenates the listed CIFTI images and names the individual volumes, if names are provided.

##### `mask_map` example

``` bash
qunex mask_map \
    --image="PALM/sWM_A_n25_reg_ztstat_C2.dscalar.nii" \
    --masks="PALM/sWM_A_n25_clustere_ztstat_fwep_C1.dscalar.nii, \
           PALM/sWM_A_n25_clustere_ztstat_fwep_C2.dscalar.nii" \
    --output="PALM/sWM_A_n25_reg_ztstat_cefwep_C1C2_0.017.dscalar.nii" \
    --maxv="0.017"
```

##### join_maps example

``` bash
qunex join_maps \
    --images="PALM/sWM_encoding_A_n25_reg_ztstat_cefwep_C1C2_0.017.dscalar.nii, \
            PALM/sWM_delay_A_n25_reg_ztstat_cefwep_C1C2_0.017.dscalar.nii, \
            PALM/sWM_response_A_n25_reg_ztstat_cefwep_C1C2_0.017.dscalar.nii" \
    --names="Encoding [CE FWE p0.017], Delay [CE FWE p0.017], Response [CE FWE p0.017]" \
    --output="Maps/sWM_n25_Z_CEFWE_p0.017.dscalar.nii"
```

### Region of interests (ROI) second-level analysis

Either as a way to provide more detailed ROI level results, or to conduct analyses on independently or *a priori* defined ROI instead of whole-brain analyses, QuNex provides a set of commands that enable identification of ROI from activation images, and extracting relevant GLM values for specified ROIs. Further analysis can then be done using other statistical and visualization tools, such as [R](https://www.r-project.org) or others suitable for your specific research focus.

Currently the following commands are provided:

* [`general_find_peaks`](../../api/gmri/general_find_peaks.rst)

    This command, which can be run from MATLAB or the command line, can be used to identify peaks in the provided input image and define ROIs using a watershed algorithm, which grows regions from peaks. The command can perform initial spatial smoothing and masking of input images, and allows specification of minimum and maximum ROI size. ROIs smaller than specified minimum size are either joined with neighbouring ROI or rejected when isolated. ROIs larger than specified maximum size are reflooded until maximum size is reached.

* [`general_extract_roi_glm_values`](../../api/gmri/general_extract_roi_glm_values.rst)

    This command, run either from MATLAB or the command line, is used to extract mean activation (beta) estimates of specified effects for the specified ROIs from volume or cifti GLM files. The resulting mean beta estimates or mean percent signal change values can be saved in simple text files in either long or wide format that can be easily imported for further processing, analysis and visualization.