# Preparing data for the HCP pipeline

````{admonition} Prerequisites
* [Onboarding new data into QuNex](../UsageDocs/OnboardingNewData)
````

For an overview on how to prepare data and run the HCP preprocessing steps, see [Overview of steps for running the HCP pipeline](../UsageDocs/HCPPreprocessing).

There are several requirements needed to run the modified HCP pipeline:

1. The data needs to be mapped to the expected HCP compliant folder structure.
2. All relevant image data parameters need to be compiled into a batch file and provided to HCP pipeline.
3. All the information about the session-specific data needs to be compiled into the same batch file that will be used by the HCP functions in QuNex (see [batch file specification](../Overview/file_batch_txt) for format description).

## Mapping raw data into the HCP folder structure

### Preparing session-specific information files

Initially one needs to map the raw data in NIfTI format residing in session's `nii` folder (see [Preparing the study](../UsageDocs/PreparingStudy) and [Onboarding new data](../UsageDocs/OnboardingNewData)) to the folder structure required by HCP.

For this to occur correctly, the imaging data information needs to be described in a session-specific information file. Specifically, the relevant information needs to be added to the description of images in the session information files such that the relevant sequence acquisition detail is provided. (these files are typically named [session_hcp.txt](../Overview/file_session_txt) files).

Below is an example of a single session information file after data import, sorting of DICOMs and `nii` folder organization.

Note that the naming here represents placeholders that would be replaced by the scanner-specific sequence names generated during the conversion of DICOMs to nii format.

``` bash
session: <session_id>
subject: <subject_id>
dicom: <path_to_study_folder/sessions/<session_id>/dicom
raw_data: <path_to_study_folder/sessions/<session_id>/nii
hcp: <path_to_study_folder/sessions/<session_id>/hcp

01: SequenceName_for_Localizer
02: SequenceName_for_T1_Scan
03: SequenceName_for_T2_Scan
04: SequenceName_for_SpinEchoDirection_AnteriorPosterior
05: SequenceName_for_SpinEchoDirection_PosteriorAnterior
06: SequenceName_for_BOLD_Task_SingleBandReference
07: SequenceName_for_BOLD_Task
08: SequenceName_for_BOLD_Rest_SingleBandReference
09: SequenceName_for_BOLD_Rest
10: SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_90directions
11: SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_90directions
12: SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_91directions
13: SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_91directions

```

Each of the images collected that are to be used and processed via the HCP pipeline need to be described using standard names, which are used for proper HCP folder mapping.

These are:

* `T1w` ... T1 weighted high resolution structural image
* `T2w` ... T2 weighted high resolution structural image
* `FM-GE` ... Gradient echo field map image used for distortion correction
* `FM-Magnitude` ... Field mapping magnitude image used for distortion correction
* `FM-Phase` ... Field mapping phase image used for distortion correction
* `boldref<N>` ... Reference image for the following (multiband) BOLD image
* `bold<N>` ... BOLD image
* `SE-FM-AP` ... Spin-echo fieldmap image recorded using the A-to-P frequency readout direction
* `SE-FM-PA` ... Spin-echo fieldmap image recorded using the P-to-A frequency readout direction
* `SE-FM-LR` ... Spin-echo fieldmap image recorded using the L-to-R frequency readout direction
* `SE-FM-RL` ... Spin-echo fieldmap image recorded using the R-to-L frequency readout direction
* `DWI` ... Diffusion weighted images, usually in pairs of phase-encoding reversed acquisition, followed by `dir<number_of_directions>_<readout_direction>` specification (e.g. `dir90_PA`)

These descriptions need to be added to the relevant image number in the `session_hcp.txt` file. The original naming information and descriptors can be left in the file following a `:`.

Therefore, the example above would be changed to:

``` bash
session: <session_id>
subject: <subject_id>
dicom: <path_to_study_folder/sessions/<session_id>/dicom
raw_data: <path_to_study_folder/sessions/<session_id>/nii
hcp: <path_to_study_folder/sessions/<session_id>/hcp

hcpready: true
01:                 :SequenceName_for_Localizer
02: T1w             :SequenceName_for_T1_Scan
03: T2w             :SequenceName_for_T2_Scan
04: SE-FM-AP        :SequenceName_for_SpinEchoDirection_AnteriorPosterior
05: SE-FM-PA        :SequenceName_for_SpinEchoDirection_PosteriorAnterior
06: boldref1:task   :SequenceName_for_BOLD_Task_SingleBandReference
07: bold1:task      :SequenceName_for_BOLD_Task
08: boldref2:rest   :SequenceName_for_BOLD_Rest_SingleBandReference
09: bold2:rest      :SequenceName_for_BOLD_Rest
10: DWI:dir90_AP    :SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_90directions
11: DWI:dir90_PA    :SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_90directions
12: DWI:dir91_AP    :SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_91directions
13: DWI:dir91_PA    :SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_91directions
```

Note that BOLD files have a sequential number appended to `bold` specification. This is the number that will be used in many of the QuNex commands and functions. Each BOLD needs to have a unique number, preferentially starting with 1 and increasing in a sequential order. In addition each BOLD file also has an additional descriptor specifying the kind of BOLD that was recorded (`task`, `control` and `rest` above). These descriptors can be used in the QuNex commands to specify which BOLD files to run specific steps on. Finally, note that for DWI data there is a specification that denotes number of directions and the phase-encoding direction (e.g. dir90_PA).

### Automatically generating session-specific mapping files for HCP data

The above information can be added manually by the QuNex user. However, if the mapping between the original sequence names and HCP image descriptors is available such that it uniquely pairs the sequence names and HCP descriptors then QuNex provides the option to use the [`create_session_info`](../../api/gmri/create_session_info.rst) command. To use `create_session_info`, a file describing the sequence-to-HCP descriptor mapping needs to be prepared. When running `create_session_info` for HCP pipelines you do not need to set the `pipelines` parameter as its default value
is already set to `hcp`.

* While the example below provides an overview the mapping file specification, the full specification can be found in: [Preparing a study-level mapping file for QuNex workflows](../UsageDocs/PreparingMappingFile) and [Mapping specification
files](../Overview/file_mapping) page.

* The default location of this mapping file is `/<path_to_sessions_folder>/sessions/specs/`. When the initial QuNex study hierarchy is generated an example mapping text file will be generated here
to allow users to proceed.

* If not specified otherwise, the command will look for the `/<path_to_sessions_folder>/sessions/specs/<pipeline>_mapping.txt`.

* Lines that include `=>` will be interpreted as mapping descriptions.

* The string before the `=>` is to be mapped to a string following the `=>`.

* Any lines not including `=>` are ignored.

* The mapping file should specify a unique mapping such that only a specific file maps to a specific string. In other words, you cannot map two different files into the same string or the result will be overwritten by the last specified string mapping.

``` bash
SequenceName_for_SpinEchoDirection_AnteriorPosterior => SE-FM-AP
SequenceName_for_SpinEchoDirection_PosteriorAnterior => SE-FM-PA
SequenceName_for_T1_Scan                             => T1w
SequenceName_for_T2_Scan                             => T2w

SequenceName_for_BOLD_Task_SingleBandReference       => boldref:task
SequenceName_for_BOLD_Task                           => bold:task
SequenceName_for_BOLD_Rest_SingleBandReference       => boldref:rest
SequenceName_for_BOLD_Rest                           => bold:rest

SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_90directions => DWI:dir90_AP
SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_90directions => DWI:dir90_PA
SequenceName_for_DiffusionSequenceDirection_AnteriorPosterior_91directions => DWI:dir91_AP
SequenceName_for_DiffusionSequenceDirection_PosteriorAnterior_91directions => DWI:dir91_PA
```

Do note that `bold` specification is listed without a number. When processing a session.txt file, `create_session_info` will automatically add the numbers in sequential order, starting with 1.

Run from `<study_folder>/sessions` folder. The function will process all `session.txt` files for each session and save the resulting remapping information as `session_hcp.txt` files.

For examples on [`create_session_info`](../../api/gmri/create_session_info.rst) command usage please refer to the _Examples_ section of the command's online reference or invoke help by running `create_session_info` in the terminal.

### Mapping the files into the HCP file structure

To map the files after the relevant `session_hcp.txt` files were generated, use [`setup_hcp`](../../api/gmri/setup_hcp.rst) command. This command performs HCP mapping for an individual session or loops through a set of sessions. Based on the information provided in `session_hcp.txt` file it maps the relevant NIfTI images to HCP folder structure. To save space, when possible, rather than copy files, hard links are generated. For details and additional options see command help. The command will print detailed report of which files were mapped and how.

The command can optionally prepare slice timing files to be used with `slicetimer` when processing BOLD files using fMRIVolume HCP pipeline. To prepare the slice timing files, set the `slice_timing_info` parameter to 'yes'. In this case, `setup_hcp` command inspects the BOLD JSON sidecards for slice timing information and prepares a custom slice timing file that can be used to align all the slices in time to the middle of the TR.

The `setup_hcp` command will map the data to the session specific hcp folder. Its location is `<sessions folder>/<session id>/hcp/<session id>`. To set up data for a new run of HCP processing while keeping existing results, use the `hcp_suffix` parameter. In this case the data will be prepared in the `<sessions folder>/<session id>/hcp/<session id><hcp_suffix>` folder.

For additional parameters, examples and details about running [`setup_hcp`](../../api/gmri/setup_hcp.rst) on multiple sessions in parallel consult the online command reference or invoke help by running `qunex setup_hcp` at the terminal.

### Group batch file

To execute the pipelines the user will need to generate a single group `batch` file (see full specification on [Generating a Batch File](../UsageDocs/GeneratingBatchFiles)). These files (commonly named `batch.txt`) list information for all the sessions, as well as study parameters, so that they don't have to be specified with each invocation of the preprocessing commands. The key advantage here is that for rapid re-processing or specification of new parameters, even within a single study, use of batch files accelerates this workflow. For the specific information on group information file format, please see the [batch file specification](../Overview/file_batch_txt) wiki page.

To help with compilation of batch files, QuNex provides a convenience [`create_batch`](../../api/gmri/create_batch.rst) command. Check out it's online command reference or run `qunex ?create_batch` for more detailed information on the command use.