# Preparing a study-level mapping file for QuNex workflows

## Mapping File Specification

For detailed information regarding the mapping file specification please refer to the [Mapping specification files](../Overview/file_mapping) page.
Briefly, the purpose of the mapping file (typically called `<pipeline>_mapping.txt`) is to enable the following workflow:
* User obtains new sequence data for a given session
* User wishes to enable mapping of such sequence data from `nii` QuNex format into a data format that supports a given pipeline (e.g. HCP pipeline).

For this to occur the user has to define a mapping file that is used by the [`create_session_info`](../../api/gmri/create_session_info.rst) command:

```bash
qunex create_session_info sessions=<sessions specification> [pipelines=hcp] [sessionsfolder=.] [sourcefile=session.txt] [targetfile=session_<pipeline>.txt] [mapping=specs/<pipeline>_mapping.txt] [filter=None] [overwrite=no]
```

You can use the `pipelines` parameter to specify a comma separated list of pipelines for which the session info file will be prepared.

This `create_session_info` command takes in 2 plain text input files:

`--mapping` --> *Mapping Specification File* for a given study [`<pipeline>_mapping.txt`] based on the following convention:

```bash
<sequence_number>|<sequence_name> => <user_defined_sequence_descriptor_tag>:[<sequence_info>:]
```

`--sourcefile` --> *Session Acquisition Information File* contains sequence names and numbers after dcm2nii conversion [session.txt] based on the following convention:

```bash
<sequence_number>:<user_defined_sequence_descriptor_tag>:[<sequence_info>:]<sequence_name>
```

It produces the output:

`--targetfile` --> *Session Pipeline Information File* [`session_<pipeline>.txt`] contains original sequence names, numbers, as well as a sequence descriptor tag and additional sequence info obtained from the data mapping file based on the following convention:

```bash
<sequence_number>:<user_defined_sequence_descriptor_tag>:[<sequence_info>:]<sequence_name>
```

## General Mapping File Guidelines
* The file should be a plain text file in which individual mappings are specified one per line: `T1w 0.7mm => T1w`
* Lines that include `=>` will be interpreted as mapping descriptions with the convention `<string_a> => <string_b>`
* The `<string_a>` before the `=>` is mapped to `<string_b>`.
* Any spaces before or after each string are ignored.
* Any lines not including `=>` or starting with `#`are ignored.
* Multiple sources for the same target can be specified.

### Mapping Sequence Names to User Defined Sequence Descriptor Tags

Sequences can be mapped by specifying a unique sequence name that can be obtained from the `session.txt` file to be assigned a unique user defined sequence descriptor tag:

```bash
<sequence_name> => <user_defined_sequence_descriptor_tag>:[<sequence_info>:]
```

* `T1w 0.7mm => T1w` -> the following line specifies that the sequence named `T1w 0.7mm` will be mapped to the descriptor tag `T1w`.

* Note that if using the <sequence_name> specification in the mapping file then this sequence has to always be associated with the same name throughout the study across all sessions and subjects.

### Mapping Sequence Numbers to User Defined Sequence Descriptor Tags

Alternatively, the sequence can be identified using the unique number (often the order of acquisition) that can be obtained from the `session.txt` file to be assigned a unique user defined sequence descriptor tag:

```bash
<sequence_number> => <user_defined_sequence_descriptor_tag>:[<user_defined_sequence_info>:]
```

* `10 => T1w` -> the following line specifies that the acquisition sequence in the session that is associated with the number `10` will always be mapped to the sequence descriptor tag `T1w` across all sessions.

* `11 => bold:rest`-> the following line specifies that the acquisition sequence in the session that is associated with the number `11` will always be mapped to the sequence descriptor tag `bold:rest` across all sessions where `bold` represents the `<user_defined_sequence_descriptor_tag>` and `rest` represents the optional user defined `<sequence_info>`

* Note that if using the <sequence_number> specification in the mapping file then this sequence has to always be associated with the same number throughout the study across all sessions and subjects.

### Guide for Automated `session.txt` to `session_<pipeline>.txt` Mapping via the Mapping File

* QuNex provides a command to support scenarios where specific acquisition sequences can be consistently assigned the same tag, either via consistent acquisition sequence names or acquisition sequence numbers.

* Specifically, the `create_session_info` command will process `session.txt` files and write out the associated `session_<pipeline>.txt` files.

* This functionality is supported by a study-level mapping file. By default the file should be present  as `<study_folder>/sessions/specs/<pipeline>_mapping.txt`.

* If the location or name is different from the default, then the exact path to the file needs to be provided when calling `create_session_info`.

* The format of the mapping file is described in the [format specification pages](../Overview/file_mapping). Briefly, the mapping file is a regular text file that describes the mapping between sequence acquisition names (e.g. `T1w 0.7mm N1`) and the appropriate user defined sequence descriptor tags for a given pipeline (e.g. `T1w`).

* As noted, each mapping is given on a separate line as:

```bash
<sequence_name> => <user_defined_sequence_descriptor_tag>:[<user_defined_sequence_info>:]
```

or

```bash
<sequence_number> => <user_defined_sequence_descriptor_tag>:[<user_defined_sequence_info>:]
```

* For functional BOLD images the `user_defined_sequence_descriptor_tag` should also contain task
  information. The number of the BOLD image should not be provided as it is added automatically.
  The number can be explicitly specified under certain conditions (see the [section below](#explicitly-assigning-bold-numbers)).

* For BOLD images the mapping file can handle `SpinEcho` field maps. When running `create_session_info` the `se(<pair number>)` will be added automatically using the following logic:

  * the first spin-echo pair will be assigned number 1,
  * the next spin-echo pair will be assigned number 2, etc.
  * structural and functional images will be assigned the number of the immediately preceding spin-echo pair
  * if no spin-echo pair was present before the structural or functional image, it will be assigned the number of the first following spin-echo pair.

* Similarly, if EPI fieldmaps are present (for Siemens this would be a pair of images tagged as `FM-Magnitude` and `FM-Phase`, or a single `FM-GE` image), then these images
will be assigned a `fm(<pair number>)` descriptor, and structural and functional images will receive the same descriptor using the logic described above.

**Please note** that the logic for associating spin-echo and EPI sequence-based field map images
depends on the images being listed in the order they were acquired. This might not be the case if
the `session.txt` file was not generated by QuNex when importing DICOM or PAR/REC files. For
instance, if the `session.txt` file was generated when importing images from BIDS datasets, the order
of the files is determined by image modalities, image types, and BIDS labels since sequence numbers
might not be available or can't be reliably extracted. In this case, the automatic assignment feature
should not be used when there are multiple field map images. Currently, the automatic assignment of
`fm([N])` and `se([N])` can be augmented if that information is provided in the `session.txt`
file or the mapping file, and the exact semantics is described in [this section](#explicitly-associating-field-maps).


#### Example

```bash
# -- This is the mapping file for a QuNex study

# -- Structural sequences
T1w 0.7mm      => T1w
T1w 0.7mm N1   => T1w
T1w 0.7mm N2   => T1w
T2w 0.7mm      => T2w
T2w 0.7mm N1   => T2w
T2w 0.7mm N2   => T2w

# -- Spin Echo pairs
C-BOLD 3mm 48 2.5s FS-P  => SE-FM-AP
C-BOLD 3mm 48 2.5s FS-A  => SE-FM-PA

# -- BOLD sequences
BOLD BLINK 3mm 48 2.5s     => bold:BLINK
BOLD FLANKER 3mm 48 2.5s   => bold:FLANKER
BOLD EC 3mm 48 2.5s        => bold:EC
10                         => bold:rest
```

Source `session.txt` file:

```bash
session: s03
subject: s03
dicom: /data/mystudy/sessions/s03/dicom
raw_data: /data/mystudy/sessions/s03/nii
hcp: /data/mystudy/sessions/s03/hcp

01: Survey
02: T1w 0.7mm N1
03: T2w 0.7mm N1
04: Survey
05: C-BOLD 3mm 48 2.5s FS-P
06: C-BOLD 3mm 48 2.5s FS-A
07: BOLD BLINK 3mm 48 2.5s
08: BOLD FLANKER 3mm 48 2.5s
09: BOLD EC 3mm 48 2.5s
10: RSBOLD 3mm 48 2.5s
11: C-BOLD 3mm 48 2.5s FS-P
12: C-BOLD 3mm 48 2.5s FS-A
13: BOLD BLINK 3mm 48 2.5s
14: BOLD BLINK 3mm 48 2.5s
15: BOLD FLANKER 3mm 48 2.5s
16: BOLD FLANKER 3mm 48 2.5s
17: BOLD EC 3mm 48 2.5s
```

Resulting `session_<pipeline>.txt` file:

``` bash
session: s03
subject: s03
dicom: /data/mystudy/sessions/s03/dicom
raw_data: /data/mystudy/sessions/s03/nii
hcp: /data/mystudy/sessions/s03/hcp

hcpready: true

01:                   :Survey
02: T1w               :se(1):T1w 0.7mm N1
03: T2w               :se(1):T2w 0.7mm N1
04:                   :Survey
05: SE-FM-AP          :se(1):C-BOLD 3mm 48 2.5s FS-P
06: SE-FM-PA          :se(1):C-BOLD 3mm 48 2.5s FS-A
07: bold1:BLINK       :se(1):BOLD BLINK 3mm 48 2.5s
08: bold2:FLANKER     :se(1):BOLD FLANKER 3mm 48 2.5s
09: bold3:EC          :se(1):BOLD EC 3mm 48 2.5s
10: bold4:rest        :se(1):RSBOLD 3mm 48 2.5s
11: SE-FM-AP          :se(2):C-BOLD 3mm 48 2.5s FS-P
12: SE-FM-PA          :se(2):C-BOLD 3mm 48 2.5s FS-A
13: bold6:BLINK       :se(2):BOLD BLINK 3mm 48 2.5s
14: bold7:BLINK       :se(2):BOLD BLINK 3mm 48 2.5s
15: bold8:FLANKER     :se(2):BOLD FLANKER 3mm 48 2.5s
16: bold9:FLANKER     :se(2):BOLD FLANKER 3mm 48 2.5s
17: bold10:EC         :se(2):BOLD EC 3mm 48 2.5s
```

#### Explicitly assigning bold numbers

By default, QuNex sequentially assigns a bold number to each bold image, starting from 1.
Internally, QuNex uses the bold tag (e.g., in `bold1:rest` rest is the bold tag) to select and
filter bold scans for processing. However, this behavior can be overridden by defining the
`bold_num(<N>)`  tag for certain BOLD scans in the mapping file. Note this will only work if the
bold number can be **uniquely** assigned to a single BOLD scan. For BOLD scans mapped without the
`bold_num` tag, their bold number will still be assigned sequentially starting from 1, and any 
number used in the `bold_num(<N>)` tag will be skipped.

For example, consider the following three sessions:

```bash
session: sess01
...
01: TASK1
02: TASK2
03: REST
04: REST
```

```bash
session: sess02
...
01: TASK2
02: REST
```

```bash
session: sess03
...
01: TASK2
02: REST
03: REST
04: REST
```

By defining the `bold_num` tag in the following mapping file, it is possible to achieve more 
consistent bold numbering across sessions.

```bash
TASK1 => bold:task1 : bold_num(3)
TASK2 => bold:task2 : bold_num(4)
REST  => bold:rest
```

With this mapping file, `TASK1` scans will always get the bold number 3 and `TASK2` scans will 
always get the bold number 4.

```bash
session: sess01
...
01: bold3:task1  : TASK1
02: bold4:task2  : TASK2
03: bold1:rest   : REST
04: bold2:rest   : REST
```

```bash
session: sess02
...
01: bold3:task2  : TASK2
02: bold1:rest   : REST
```

In this case, image `04` gets `bold5` because `bold3` is reserved for `TASK1` in the mapping file.

```bash
session: sess03
...
01: bold4:task2  : TASK2
02: bold1:rest   : REST
03: bold2:rest   : REST
04: bold5:rest   : REST
```

#### Explicitly associating field maps

The algorithm automatically associating field maps with structural and functional images has two
phases. The first phase identifies valid consecutive field map pairs with associativity that favors
field maps appearing later. Specifically, in the following example, both 01/02 and 02/03 can be
considered a valid pair. Here we favor the latter case since it is more common. The field map pairs
are numbered sequentially, starting from one(1), and spin-echo (`se`) and EPI field maps (`fm`) are 
numbered separately. 

```bash
# mapping
SpinEcho_AP => SE-FM-AP
SpinEcho_PA => SE-FM-PA

# session_hcp
session: sess01
...
01:              : SpinEcho_PA
02: SE-FM-AP     : SpinEcho_AP : se(1)
03: SE-FM-PA     : SpinEcho_PA : se(1)
```

The second phase of the algorithm associates field map pairs identified in the previous step with
other images with the logic described in previous sections.

The `se`/`fm` descriptors can be added to session and mapping files. User-defined se/fm descriptors
are treated differently when defined on field map images versus structural and functional images.
If any se/fm descriptors are defined on spin-echo or EPI field map images, the first phase of the
algorithm will not execute, so only user-defined `se`/`fm` descriptors will be used. The `se`/`fm`
descriptors defined on structural/functional images do not affect the first stage of the algorithm
in any way.

During the second phase, if `se`/`fm` descriptors are defined on structural and functional images,
we will verify that the referred field map pair exists. Otherwise, we will follow the same rule as
described previously. By allowing custom field map pairing, the images in a pair are no longer
guaranteed to be consecutive. We define relative order between field maps and other images by only
looking at the position of the **first** image in a field map pair.

---

### Guide for Manual `session.txt` to `session_<pipeline>.txt` Mapping

* As noted, after acquired imaging data are onboarded and organized into the QuNex folder structure, the acquisition details will listed in the `session.txt` file.

* For a full specification please refer to the [Sessions files](../Overview/file_session_txt) page.

* `session.txt` will contain the sequence number, the name of the sequence and additional sequence specific information that was extracted from the sidecar json file.

* Here is an example of `session.txt` file that only contains sequence names as extracted from `dicom` files:

```bash
session: s03
subject: s03
dicom: /data/mystudy/sessions/s03/dicom
raw_data: /data/mystudy/sessions/s03/nii
hcp: /data/mystudy/sessions/s03/hcp

## --> <sequence_number>: <sequence_name>
01: Survey
02: T1w 0.7mm N1
03: T2w 0.7mm N1
04: Survey
05: C-BOLD 3mm 48 2.5s FS-P
06: C-BOLD 3mm 48 2.5s FS-A
07: BOLD BLINK 3mm 48 2.5s
08: BOLD FLANKER 3mm 48 2.5s
09: BOLD EC 3mm 48 2.5s
10: RSBOLD 3mm 48 2.5s
11: C-BOLD 3mm 48 2.5s FS-P
12: C-BOLD 3mm 48 2.5s FS-A
13: BOLD BLINK 3mm 48 2.5s
14: BOLD BLINK 3mm 48 2.5s
15: BOLD FLANKER 3mm 48 2.5s
16: BOLD FLANKER 3mm 48 2.5s
17: BOLD EC 3mm 48 2.5s
```

* To provide the relevant information for further pipeline execution the above `session.txt` file would have to be changed to the `session_<pipeline>.txt` file. An example is provided below for the HCP dataset (referred to as the `session_hcp.txt`) file.

* In the `session_<pipeline>.txt` file each pipeline supported sequence has an appropriate tag (e.g. `T1w`, `bold4`).

* Note that functional images have the additional (required) information about the type of BOLD data (e.g. `rest` for resting state,`FLANKER` for flanker task),

* For each BOLD sequence the number of the relevant spin echo field map pair is provided in the form `se(<pair number>)`. This information allows precise matching of spin echo fieldmap pairs to the sequences to which it pertains.

* The HCP sequence descriptor tags that are currently supported are:

* `T1w` ... T1 weighted high resolution structural image
* `T2w` ... T2 weighted high resolution structural image
* `FM-GE` ... Gradient echo field map image used for distortion correction
* `FM-Magnitude` ... Field mapping magnitude image used for distortion correction
* `FM-Phase` ... Field mapping phase image used for distortion correction
* `boldref<N>` ... Reference image for the following (multiband) BOLD image
* `bold<N>` ... BOLD image
* `SE-FM-AP` ... Spin-echo fieldmap image recorded using the A-to-P frequency readout direction
* `SE-FM-PA` ... Spin-echo fieldmap image recorded using the P-to-A frequency readout direction
* `SE-FM-LR` ... Spin-echo fieldmap image recorded using the L-to-R frequency readout direction
* `SE-FM-RL` ... Spin-echo fieldmap image recorded using the R-to-L frequency readout direction
* `DWI` ... Diffusion weighted images, usually in pairs of phase-encoding reversed acquisition, followed by `dir<number_of_directions>_<readout_direction>` specification (e.g. `dir90_PA`)

* Here is an example of `session_hcp.txt` file that contains the correctly mapped sequence descriptor tags and info for HCP folder structure mapping and pipeline execution. Note that the `hcpready: true` denotes that the user-specified HCP-style descriptor tags are ready to be used.

```bash
session: s03                                # -- Header
subject: s03                                # -- Header
dicom: /data/mystudy/sessions/s03/dicom     # -- Header
raw_data: /data/mystudy/sessions/s03/nii    # -- Header
hcp: /data/mystudy/sessions/s03/hcp         # -- Header

hcpready: true  # -- Denotes that the user-specified HCP-style descriptor tags are ready to be used.

# -- <sequence_number>:<user_defined_sequence_descriptor_tag>:[<sequence_info>:]<sequence_name>

01:                   :Survey
02: T1w               :se(1):T1w 0.7mm N1
03: T2w               :se(1):T2w 0.7mm N1
04:                   :Survey
05: SE-FM-AP          :se(1):C-BOLD 3mm 48 2.5s FS-P
06: SE-FM-PA          :se(1):C-BOLD 3mm 48 2.5s FS-A
07: bold1:BLINK       :se(1):BOLD BLINK 3mm 48 2.5s
08: bold2:FLANKER     :se(1):BOLD FLANKER 3mm 48 2.5s
09: bold3:EC          :se(1):BOLD EC 3mm 48 2.5s
10: bold4:rest        :se(1):RSBOLD 3mm 48 2.5s
11: SE-FM-AP          :se(2):C-BOLD 3mm 48 2.5s FS-P
12: SE-FM-PA          :se(2):C-BOLD 3mm 48 2.5s FS-A
13: bold6:BLINK       :se(2):BOLD BLINK 3mm 48 2.5s
14: bold7:BLINK       :se(2):BOLD BLINK 3mm 48 2.5s
15: bold8:FLANKER     :se(2):BOLD FLANKER 3mm 48 2.5s
16: bold9:FLANKER     :se(2):BOLD FLANKER 3mm 48 2.5s
17: bold10:EC         :se(2):BOLD EC 3mm 48 2.5s
```