Quality Assurance in QuNex#

Quality Assurance is an important but highly tedious step in most MRI preprocessing workflows. The run_qa command helps ease this process, and supports the following:

Raw Data QA (--datatype=raw_data),
Config File QA (--datatype=config).

Quality Assurance, QA, is not to be confused with Quality Control (QC), and it's command run_qc. In short, QA is responsible for evaluating processing efficiency and completion, whereas QC is responsible for evaluating the results of the processing.

Using the `run_qa` command#

qunex run_qa \
    --datatype=<Type of QA> \
    --sessionsfolder=<QuNex sessions folder> \
    --sessions=<Sessions to QA> \
    --configfile=<QA config file> \
    --tag=<Output identifier> \
    --overwrite=<Overwrite, yes or no>

This command will run QA on all specified sessions according to a highly-customizable user-created configuration YAML file. Usually, this entails checking whether specified files exist and that certain parameters have expected values.

Once completed, run_qa will output lists of sessions that have passed and failed the declared QA, as well as reports, both human and machine readable, that detail why and how these sessions failed.

The QA performed is highly dependent on two flags: configfile and datatype. The configfile flag should point towards a configuration file. The datatype flag must be a string referring to the type of data on which you want to run QA. This page will focus primarily on configuring these two parameters.

For more precise info on other flags and actually running the command, see the command's page.

Inputs#

Command inputs vary significantly depending on the QA specified, but at the bare minimum the command requires a configuration YAML file and a folder for each session, found within the --sessionsfolder directory.

Outputs#

The run_qa command generates four files: two lists containing sessions that have passed/failed QA, and two reports (one human-readable and one machine-readable containing the same info). These are generated inside processing/lists and processing/reports respectively.

`processing/lists/QA_pass_{datatype}{tag/config}.list`#

The first output is a file containing all sessions that have passed the specified QA. It is formatted as a QuNex .list file. This means each line corresponds to an individual session, with the format:

session id: {session_1}
session id: {session_2}
...
session id: {session_N}

The result is that this file can input directly into qunex commands with the --sessions parameter:

--sessions="QA_pass_raw_data.list"

The goal is that users can use this list to continue processing, without including problematic sessions or those that would require different processing. These lists files also offer some additinoal functionality, see more info on list files here.

`processing/lists/QA_fail_{datatype}{tag/config}.list`#

This output file has the same .list format as above, but instead contains all sessions that have failed QA. Similarly, this file can be input directly into QuNex commands with the --sessions parameter, even into run_qa itself if you wish to investigate the data further.

`processing/reports/QA_report{datatype}{tag/config}.txt`#

This file will contain a human-readable report of the QA outcomes, particularly for sessions that have failed the QA. What exactly is contained within the report is highly dependent on the QA run, but will typically explain why sessions failed the QA and what precisely went wrong.

`processing/reports/QA_report{datatype}{tag/config}.yml`#

This file output has all the same information as the above but in a machine-friendly YAML (either the .yml or .yaml extension) format. It also contains internal variables that may be useful to those developing pipelines off the outputs of run_qa.

The Configuration file#

Because the QA requiriements greatly vary across datasets and studies, run_qa is designed to be highly user-customizable, controlled through a user-created configuration YAML file. If you're unfamiliar with YAML format, see the documentation here.

In this file, users can define nested parameter-value pairs and sequences pertaining to your data. Basically, it allows you to tell run_qa what things you want to check in your data and what you expect them to be.

The contents will be quite different depending on the QA type you're running and your data, but it should follow this basic format:

datatypes:
    <Specified Data-type 1>:
        <param>:<value>
        <param>:
            <sub-param>:<value>

    <Specified Data-type 2>:
        - <sequence param>:
            <sub-param>:<value>
            <sub-param>:
                <sub-sub-param>:<value>

config:
    <Additional config options>

Parameters and sub-parameters must be within the scope of their corresponding datatype or parameter. They can either be specified directly as key-value pairs, or as yaml sequences starting with - depending on the data type. See below for data type specific parameters and config creation.

Data Types#

Only the below data types are currently supported.

Raw Data QA (`--datatype=raw_data`)#

Raw Data QA checks whether found scans are in-line with the scan protocol, defined by the user in the supplied config. This is usually ran after import_<datatype>. When ran it executes various checks to ensure data is valid and consitent with the acquisition protocol before processing. The main goal is to identify problematic sessions before you start processing, saving time and resources. It should also prevent users from needing to manually identify missing/misordered scans.

To specify Raw Data QA in your config, it must be added underneath datatypes as raw_data:

datatypes:
    raw_data:

`- scan`#

For each scan/image the user wishes to QA, they must add a corresponding scan-config in their configuration file with the tag - scan. Each scan-config must have the series_description parameter, which is used to identify which image (as labeled in the session.txt file) you are attempting to QA.

datatypes:
    raw_data:
        - scan:
            series_description: T1w

        - scan:
            series_description: BOLD1

        - scan:
            series_description: BOLD2

Note: this series_description field also accepts the user of wildcards, *, or specifying multiple acceptable scans with |.

        - scan:
            series_description: T1w run-1|T1w run-2

        - scan:
            series_description: BOLD1*

This is all you need to run a basic QA: run_qa will simply check whether each session has scans that explicitly match the series_description specified. One possible use-case for a config like this is in mapping file verification for create_session_info.

To do more advanced QA, users can add a combination of parameters and sub-parameters. Aside from series_description all are optional, though depending on the data not all are practical.

Here are all potential parameters that can be specified at the scan level:

        - scan:
            series_description: --> Scan identifier, looks in session.txt
            required:           --> Whether scan must be present for a session to pass QA
            dicoms:             --> The number of dicoms before NIfTI conversion (from import_dicom)
            session:            --> Contains sub-parameters related to the session.txt file
                <sub-params>
            json:               --> Contains sub-parameters related to the sidecar .json file
                <sub-params>
            nii:                --> Contains sub-parameters related to the NIfTI file header
                <sub-params>

`session`#

As you are likely familiar by now, the main identifier for scans in the QuNex session hierarchy is the session.txt file. The first level of QA, under the session key, pertains to the contents of this file.

Here are the potential parameters that can be specified at the session level, all of which are optional:

            session:
                image_count:    --> Number of expected images
                image_number:   --> Associated image number
                scan_index:     --> The scan's index, if multiple images found
                acquisition:    --> The scan's acquisition number, if split into multiple

The goal for these parameters is to help in the scenario that there are multiple images with the same series_description: these allow users to narrow down which images they actually want to QA.

`json`#

In QuNex (and most modern neuroimaging formats), raw NIfTI image files are coupled with a JSON file containing processing related information and metadata, also known as 'sidecar JSON' files. These are located with their Nifti images in sessions/<session_id>/nii.

To allow for in-depth customization, json QA allows you to specify any key, so long as they correspond with an actual value in the sidecar JSON. The only exception is the key normalized, which is an easier way to require image normalization than using the associated key. Below are some examples, but these are not exhaustive. We recommend checking the .json files of a pilot subject / similar dataset to find the keys associated with your protocol.

            json:
                normalized:             --> Whether or not it is a normalized image
                RepetitionTime:         --> EXAMPLE
                DwellTime:              --> EXAMPLE
                PhaseEncodingDirection: --> EXAMPLE
                EffectiveEchoSpacing:   --> EXAMPLE

Note: data will be converted to String (text) for actual comparison which may cause issues with mathematical notation

`nii`#

Though sidecar JSON files contain a lot of useful information on Nifti files, some data is only available directly in the Nifti image's header. Therefore, run_qa is also able to read NIfTI images' headers for key-value validation. Similar to JSON QA, this step has no requirements on keys other than that they exist in the data. The only exception is the data_shape key, which gives info not available in the header.

This is more advanced than JSON QA, as Nifti images are not simple text files you can print, you will need to use some other software to read them yourself. However, there are many options for this, for example fslhd fromo FSL.

            nii:
                data_shape:             --> Data shape of the acquired data, specified as an array

`- scan` Example QA#

Here we will run an example raw_data -scan QA. Though you can use this as a base, the keys and values must be adjusted according to your data and analysis.

Below is our session.txt file after initial import (after running import_dicom for example).

  Localizer [1/3]
  Localizer [2/3]
  Localizer [3/3]
  T1w_MPR
  T1w_MPR
  T2w_SPC
  T2w_SPC
  SpinEchoFieldMap_AP
  SpinEchoFieldMap_PA
  Resting_AP
  Resting_AP_SBRef
 Resting_PA
 Resting_PA_SBRef

And here is our mapping file. A core use-case for raw_data is that it can be used to validate HCP image mapping will work correctly, and should be setup with the mapping file in mind.

31                   => T1w
51                   => T2w
SpinEchoFieldMap_AP  => SE-FM-AP
SpinEchoFieldMap_PA  => SE-FM-PA
Resting_AP           => bold:rest
Resting_AP_SBRef     => boldref:rest
Resting_PA           => bold:rest
Resting_PA_SBRef     => boldref:rest

Let's start with the anatomical data, that being the T1 and T2 structural images.

You'll notice that we have two images for each scan: these are the original image and the normalized image. It is somewhat common in many protocols to save both and it serves as a good example. In the mapping, we specify the second, normalized, image of each pair using the image number from our session.txt as this is the one we want to use in preprocessing. If we were to map both it would attempt to average them which we do not want. Therefore, we must ensure that image exists in our QA config using the normalized and image_number keys.

We may also want to ensure the data matches our defined protocol. Therefore, we will also check the data matrix shape, Repetition Time, Echo Time, and Dwell Time.

Furthermore, in this example dataset we have sessions from both GE and Siemens machines which have different protocols. We should evaluate the device manufacturer as well to help differentiate required processing.

datatypes:
  raw_data:
    - scan:
        series_description: T1w_MPR
        session:
          image_number: 31
        nifti:
          data_shape: [208, 300, 320]
        json:
          RepetitionTime: 2.5
          EchoTime: 0.00207
          Manufacturer: "Siemens"
          DwellTime: 6.5e-06
          normalized: True

Above, we have only included the normalized image as that's the one we need. However, if you want to QA multiple images that share the same name, you can do so by specifying parameters as a list:

    - scan:
        series_description: T2w_SPC
        session:
          image_number: [41, 51]
        nifti:
          data_shape: [[208, 300, 320],[208, 300, 320]]
        json:
          EchoTime: 0.00207
          normalized: [False, True]

Here, for the T2 scan, we are now requiring both the original and normalized images. Again, to ensure our mapping is correct we require the normalized image be the second scan using the image_number key. Otherwise, as these images are identical, we don't need to differentiate for any of the other parameters. You can either continue to specify values as a list, but keep them the same (like for data_shape), or just leave them as one value (like for EchoTime). It will extrapolate single values to multiple scans where possible, but if there is a mismatch for multiple values (eg. you've specified three values for EchoTime, but only two for normalized) then it will cause an error.

Now let's evaluate the fieldmaps. We want to ensure we have a single SpinEcho for each direction, but as we do not care about the specific image number we will instead use the image_count key.

In addition to our basic parameter validation, we also want to make sure that the direction (AP/PA) actually matches how the scan is labelled using the PhaseEncodingDirection key.

    - scan:
        series_description: SpinEchoFieldMap_AP
        session:
          image_count: 1
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 6.2
          EchoTime: 0.06
          PhaseEncodingDirection: j-
    
    - scan:
        series_description: SpinEchoFieldMap_PA
        session:
          image_count: 1
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 6.2
          EchoTime: 0.06
          PhaseEncodingDirection: j

Finally, we have our two resting-state functional images, one AP and one PA, and their reference images.

In this example, we know that some sessions were acquired with more than just two resting Bold images. However, as this will not negatively impact our preprocessing we won't make this a condition for failing QA. Rather, for the sake of showcasing possibilities, we will simply require that sessions have at least one AP bold rest image, and one PA. You can do this by neglecting to specify the session subconfig:

    - scan:
        series_description: Resting_AP
        nifti:
          data_shape: [90, 90, 60, 333]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-

    - scan:
        series_description: Resting_AP_SBRef
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-

    - scan:
        series_description: Resting_PA
        nifti:
          data_shape: [90, 90, 60, 333]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j

    - scan:
        series_description: Resting_PA
        nifti:
          data_shape: [90, 90, 60]
        json:
          RepetitionTime: 0.9
          EchoTime: 0.035
          PhaseEncodingDirection: j-

`- other`#

In addition to MRI images, run_qa allows users to specify almost any file for validation: so long as data is stored as text and not encoded. A common use-case would be validating each MR session has corresponding behavioral task data. Note, this is an advanced feature, and will probably require some trial and error depending on how things are structured in your study.

To identify the file, - other requires the file_name parameter to be specified, containing the local path to a file inside each session's folder.

datatypes:
    raw_data:
        - other:
            file_name: behavior/pass.txt

        - other:
            file_name: behavior/mid.csv

        - other:
            file_name: behavior/nback.yml

With only file_name specified, run_qa will simply validate if the files exist and move on.

Validating the data inside these files is possible through a combination of parameters, and varies in difficulty. Here is a list of all parameters. Note that, only file_name is required, many parameters will be set automatically based off the detected file type but can be changed if desired.

        - other:
            file_name:        --> Local path of file to validate
            file_extension:   --> One of [.json,.yaml,.yml,.tsv,.csv,.txt]. If unset, will be sourced from the file_name
            deliminator:      --> String to use as a deliminator. If unset, will be sourced from the file_extension if it is a deliminated file (eg. .csv), otherwise it will not be set.
            header:           --> Line index of header/column names. If unset, will be set to the first line, only used if data has a deliminator.
            data_column:      --> String value of column to validate. If unset, will use all columns, only used if data has a deliminator.
            index_column:     --> String value of column to use as index. If unset, will use the line index, only use if data has a deliminator.
            required:         --> Boolean True/False. Whether file is required to exist to pass QA. Default True.
            values:           --> Data values to validate against, specified as key/value pairs. 

Aside from values, most of the above parameters are used to extract which data to validate. Similar to parameters like 'json' in '- scan' validation, values allows key-value pairs to be specified below it, with no strict requirements (other than that they exist in the data). However, it is no longer stratified by file type explicitly which makes it more complex. See the below example for practical info on how to fill this field out.

`- other` Example QA#

In this example, we will verify sessions have three (fake) task-related files in a subfolder called behavior.

pass.txt

The first is a simple text file that lists for a series of tasks whether the subject passed enough trials to be considered in the analysis. We want to validate that performance on the MID (line 3) and nBACK (line 4) were sufficient before we process them.

pass
pass
pass
pass
pass
pass
fail
pass

As such, we need to ensure that lines 3 and 4 have the String value of 'pass'. Because this data is line-separated with no deliminator, we can parse without any extra parameters.

datatypes:
    raw_data:
        - other:
            file_name: behavior/pass.txt
            values:
                2:'pass'
                3:'pass'

Note: line indexing starts at 0, so lines 3 and 4 will have line indices 2 and 3 respectively.

mid.csv

The second is a CSV file that lists basic information for each run for that subject. We want to ensure that the fourth run of each block has 40 trials.

id,run,pass,interrupted,ntrials
subj0,block1-run1,True,False,40
subj0,block1-run2,True,False,39
subj0,block1-run3,True,True,40
subj0,block1-run4,True,False,40
subj0,block2-run1,True,True,38
subj0,block2-run2,False,False,40
subj0,block2-run3,True,False,40
subj0,block2-run4,True,False,40

Because this data is deliminated, we can specify the specific columns we're interested (ntrials). Additionally, we can specify run as an index column to ensure we validate the correct data, rather than just using the line index.

        - other:
            file_name: behavior/mid.csv
            data_column: ntrials
            index_column: run
            values:
                block1-run4:40
                block2-run4:40

Note: above we could have specified the file_extension, deliminator, and header parameters to match the file like so:

            file_extension: csv
            deliminator: ,
            header: 0

However, these are advanced parameters: only needed when it is an mislabeled/unusual/non-standard file. Here, the input is a basic CSV file that has been setup correctly, so the default settings for these parameters will work fine.

nback.yml

Finally, we want to check that the subject passed the first trial of the nBACK task, it wasn't interrupted, and that it had all 50 runs.

nback:
    trial-1:
        pass: True
        interrupted: False
        nruns: 50

Though this may seem complicated, as it is a YAML file with nested parameters, it is actually the simplest to implement:

        - other:
            file_name: behavior/nback.yml
            values:
                trial-1:
                    pass: True
                    interrupted: False
                    nruns: 50

The values specification is quite flexible. Though not shown, values (even data and index columns) can be specified as lists as well.

Config QA (`--datatype=config`)#

A utility feature, config QA allows users to check their configuration files parse without failing, without trying to run any actual QA on the real data. It is recommended, though not required, that users run config QA before other datatypes to validate their config.

QuNex documentation

Quality Assurance in QuNex

Contents

Quality Assurance in QuNex#

Using the `run_qa` command#

Inputs#

Outputs#

`processing/lists/QA_pass_{datatype}{tag/config}.list`#

`processing/lists/QA_fail_{datatype}{tag/config}.list`#

`processing/reports/QA_report{datatype}{tag/config}.txt`#

`processing/reports/QA_report{datatype}{tag/config}.yml`#

The Configuration file#

Data Types#

Raw Data QA (`--datatype=raw_data`)#

`- scan`#

`session`#

`json`#

`nii`#

`- scan` Example QA#

`- other`#

`- other` Example QA#

Config QA (`--datatype=config`)#

QuNex documentation

Quality Assurance in QuNex

Contents

Quality Assurance in QuNex#

Using the run_qa command#

Inputs#

Outputs#

processing/lists/QA_pass_{datatype}{tag/config}.list#

processing/lists/QA_fail_{datatype}{tag/config}.list#

processing/reports/QA_report{datatype}{tag/config}.txt#

processing/reports/QA_report{datatype}{tag/config}.yml#

The Configuration file#

Data Types#

Raw Data QA (--datatype=raw_data)#

- scan#

session#

json#

nii#

- scan Example QA#

- other#

- other Example QA#

Config QA (--datatype=config)#

Using the `run_qa` command#

`processing/lists/QA_pass_{datatype}{tag/config}.list`#

`processing/lists/QA_fail_{datatype}{tag/config}.list`#

`processing/reports/QA_report{datatype}{tag/config}.txt`#

`processing/reports/QA_report{datatype}{tag/config}.yml`#

Raw Data QA (`--datatype=raw_data`)#

`- scan`#

`session`#

`json`#

`nii`#

`- scan` Example QA#

`- other`#

`- other` Example QA#

Config QA (`--datatype=config`)#