import_dicom#

qx_utilities.general.dicom.import_dicom(sessionsfolder=None, sessions=None, masterinbox=None, check='any', pattern=None, nameformat=None, tool='auto', parelements=1, logfile=None, archive='move', add_image_type=0, add_json_info='', unzip='yes', gzip='folder', verbose='yes', overwrite='no', test=False)#

import_dicom [sessionsfolder=.] [sessions=""] [masterinbox=<sessionsfolder>/inbox/MR] [check=any] [pattern="(?P<packet_name>.*?)(?:\.zip$|\.tar$|.tgz$|\.tar\..*$|$)"] [nameformat='(?P<subject_id>.*)'] [tool=auto] [parelements=1] [logfile=""] [archive=move] [add_image_type=0] [add_json_info=""] [unzip="yes"] [gzip="folder"] [verbose=yes] [overwrite="no"]

Automatically processes packets with individual sessions' DICOM or PAR/REC files all the way to, and including, generation of NIfTI files.

Parameters

--sessionsfolder (str, default '.'):

The base study sessions folder (e.g. WM44/sessions) where the inbox and individual session folders are. If not specified, the current working folder will be taken as the location of the sessionsfolder.

--sessions (str, default ''):

A comma delimited string that lists the sessions to process. If master inbox folder is used, the parameter is optional and it can include regex patterns. In this case only those sessions identified by the pattern that also match with any of the patterns in the sessions list will be processed. If masterinbox is set to none, the list specifies the session folders to process, and it can include glob patterns.

--masterinbox (str, default <sessionsfolder>/inbox/MR):

The master inbox folder with packages to process. By default masterinbox is in sessions folder: <sessionsfolder>/inbox/MR. If the packages are elsewhere, the location can be specified here. If set to "none", the data is assumed to already exist in the individual sessions' inbox folder: <studyfolder>/<sessionsfolder>/<session id>/inbox.

--check (str, default 'any'):

The type of check to perform when packages or session folders are identified. The possible values are:

  • 'no' ... report and continue w/o additional checks

  • 'any' ... continue if any packages are ready to process report error otherwise.

--pattern (str, default '(?P<session_id>.*?)(?:.zip$|.tar$|.tgz$|.tar..*$|$)'):

The regex pattern to use to find the packages and to extract the session id.

--nameformat (str, default '(?P<subject_id>.*)'):

The regex pattern to use to extract subject id and (optionally) the session name from the session or packet name.

--tool (str, default 'auto'):

What tool to use for the conversion. It can be one of:

  • 'auto' (determine best tool based on heuristics)

  • 'dcm2niix'

  • 'dcm2nii'

  • 'dicm2nii'.

--parelements (int, default 1):

The number of parallel processes to use when running converting DICOM images to NIfTI files. If specified as 'all', all avaliable resources will be utilized.

--logfile (str, default ''):

A string specifying the location of the log file and the columns in which packetname, subject id and session name information are stored. The string should specify: "path:<path to the log file>| packetname:<name of the packet extracted by the pattern>|subjectid:<the column with subjectid information>[|sessionid:<the column with sesion id information>]".

--archive (str, default 'move'):

What to do with a processed package. Options are:

  • 'move' ... move the package to the default archive folder

  • 'copy' ... copy the package to the default archive folder

  • 'leave' ... keep the package in the session or master inbox folder

  • 'delete' ... delete the package after it has been processed.

In case of processing data from a sessions folder, the archive parameter is only valid for compressed packages.

--add_image_type (int, default 0):

Adds image type information to the sequence name (Siemens scanners). The value should specify how many of image type labels from the end of the image type list to add.

--add_json_info (str, default ''):

What sequence information to extract from JSON sidecar files and add to session.txt file. Specify a comma separated list of fields or 'all'. See list in session.txt file description below.

--unzip (str, default 'yes'):

Whether to unzip individual DICOM files that are gzipped. Valid options are 'yes', 'no'.

--gzip (str, default 'folder'):

Whether to gzip individual DICOM files after they were processed ('file'), gzip a DICOM sequence or acquisition as an tar.gz archive ('folder'), or leave them ungzipped ('no'). Valid options are 'folder', 'file', 'no'.

--verbose (str, default 'yes'):

Whether to provide detailed report also of packets that could not be identified and/or are not matched with log file.

--overwrite (str, default 'no'):

Whether to remove existing data in the dicom and nii folders.

Notes

The command is used to automatically process packets with individual session's DICOM or PAR/REC files all the way to, and including, generation of NIfTI files. Packet can be either a zip file, a tar archive or a folder that contains DICOM or PAR/REC files.

The command can import packets either from a dedicated masterinbox folder and create the necessary session folders within --sessionsfolder, or it can process the data already present in the session specific folders.

The next sections will describe the two use cases in more detail.

Processing data from a dedicated inbox folder:

This is the default operation. In this mode of operation:

  • The candidate packages are identified by a pattern parameter, which also specifies, how to extract a packet name.

  • The packets found are optionally filtered using the sessions parameter.

  • Subject id and (optionally) session name are either extracted from the packet name using the nameformat parameter or looked up in a log file.

  • A report of packets identified is generated.

  • Session folders are created and packet data is moved or copied to the session's inbox folder.

  • Dicom data is sorted into folders holding information from a single scan

  • Images are converted to nifti format

  • session.txt files are generated

  • Original packets are archived as specified by the archive parameter.

In this mode of operation the masterinbox parameter passed to import_dicom has to provide a path to the folder with the incoming packets. The default location is <study>/<sessionsfolder>/inbox/MR, which is used automatically if masterinbox is not specified. Data from each session has to be present in the masterinbox directory either as a separate folder with the raw DICOM files or as a compressed package with that session's data. import_dicom supports the following packages: .zip, .tar, .tar.gz, .tar.bz2, .tarz and .tar.bzip2.

The pattern parameter is used to specify, which files and/or folders are to be identified as potential packets to be processed. Specifically, the pattern parameter is a string that specifies a regular expression against which the files and folders in the masterinbox are matched. In addition, the regular expression has to return a named group, 'packet_name' that is used in further processing.

The default pattern parameter is "(?P<packet_name>.*?)(?:.zip$|.tar$|.tar..*$|$)". This pattern will identify the initial part of the packet file- or foldername, (without any extension that identifies a compressed package) as the packet name.

Specifically:

  • OP386

  • OP386.zip

  • OP386.tar.gz

will all be identified as packet names 'OP386'.

Next the packet name has to be processed to identify the subject id and (optionally) the session name. This can be done in one of two ways. If the necessary information is present in the packet name itself, it can be extracted as specified in by the nameformat parameter. If not, it can be specified using a logfile parameter.

Extracting subject id from packet name:

To extract subject id from a packet name, the nameformat parameter has to specify a regular expression that will extract the subject id and (optionally) the session name from the packet name as named groups, subject_id and session_name, respectively. The default nameformat parameter is "(?P<subject_id>.*)". It assumes that the packet name is equal to the subject id and only a single session was recorded. Here are a few additional examples of how subject id and session names can be extracted using the nameformat parameter:

packet name

nameformat parameter

subject id

session name

session id

AP346_MR_1

"(?P<subject_id>.*?)_(?P<session_name>.*)"

AP346

MR_1

AP346_MR_1

Siemens_Baseline-S002

".*?_(?P<session_name>.*?)-(?P<subject_id>.*)"

S002

Baseline

S002_Baseline

Yale-EQ469-Placebo

".*?-(?P<subject_id>.*?)-(?P<session_name>.*)"

EQ469

Placebo

EQ469_Placebo

Oxford.MR492.T3-Trio

".*?.(?P<subject_id>.*?)..*"

MR492

MR492

Shown are the extracted packet name, the nameformat regular expression, the resulting extracted subject id and session name (when present), and the final generated session id.

Looking up subject id in a log file:

If subject id and (optionally) session name is not present or cannot be robustly extracted from the package name, it is possible to make use of a file that provides the mapping between package names, subject ids and session names. A log file has to be either a comma separated value (.csv) file or a tab separated text file in which each row provides information about a single scanning session. An example log file (e.g. scanning_sessions.csv) can be:

scanning code,subject,session,date of scan, ...
AP1789,S001,baseline,2019-03-21, ...
AP1790,S001,incentive,2019-03-21, ...
WID1832,S002,baselime,2019-04-12, ...
WID1913,S002,incentive,2019-04-12, ...

To use a log file, a logfile parameter has to be provided. The content of the logfile has to be a string of the following format:

path:<path to the log file>|packet_name:<the column number with the packet name>|subject_id:<a column number with the subject id>|session_name:<a column number with the session name>

In case of the above information, the logfile parameter would be:

--logfile="path:/studies/myStudy/info/scanning_sessions.csv|packet_name:1|subject_id:2|session_name:3"

And the resulting mapping would be:

packet name

subject id

session name

session id

AP1789

S001

baseline

S001_baseline

AP1790

S001

incentive

S001_incentive

WID1832

S002

baseline

S002_baseline

WID1913

S002

incentive

S002_incentive

Shown are the extracted packet name, the extracted subject id and session name, and the final generated session id.

Do note that at least packet_name and subject_id have to be provided in the logfile parameter and in the log file itself. If session_name is not provided, it is assumed that only a single session was recorded for each subject and session id equals subject id.

Further processing:

As can be seen from the examples, after the subject id and (optionally) the session name are extracted, the session id is generated using the formula <subject_id>[_<session_name>], where _<session_name> is appended only if extracted from either the packet name or the log file. The generated session id would then be used to name the sessions' folders in the /studies/myStudy/sessions.

The progress of processing now depends on the check parameter. If the check parameter is set to any it will proceed if any packets to process were found, and it will report an error otherwise. If check is set to no, no additional check will be performed. If any packets were found to be processed, they will be processed. If none were found, the command will exit without reporting an error.

If packets were found to process and a go ahead was given, import_dicom will then copy, unzip or untar all the files in each packet into an inbox folder created within the session folder. Once all the files are extracted or copied, depending on the archive parameter, the packet is then either moved ('move') or copied ('copy') to the <study>/sessions/archive/MR folder, left as is ('leave'), or deleted ('delete'). If the archive folder does not yet exist, it is created. The default archive setting is 'move'.

If a session folder and an inbox folder within it already exists, then the related packet will not be processed so that the existing data is not changed. In this case the user has to either remove or rename the existing folder(s) and rerun the command to process those packet(s) as well.

Filtering sessions:

If not all packets in the masterinbox folder are to be processed, it is possible to explicitly define which packets can be processed by specifying the sessions parameter. The parameter is a comma separated string of packet names that can be processed. Each entry in the list can be a regular extension pattern, in which case all the packet names that match any of the patterns will be processed. Following the last example above, specifying:

--sessions=".*_baseline"

Would only process the baseline sessions and prepare data in these session-specific folders:

  • /studies/myStudy/sessions/S001_baseline

  • /studies/myStudy/sessions/S002_baseline

Processing data from a session folder:

If the raw DICOM files or compressed packages with the raw DICOM files are already present in the respective <study>/sessions/<session id>/inbox folders, then the masterinbox parameter has to be explicitly set to 'none', and the session folders to be processed have to be listed in the sessions parameter. In this case the session parameter is a comma separated string, where each entry in the list can be a glob pattern matching with multiple session folders.

Please note that the sessions parameter is only used to identify possible folders. If a session folder is not present, even though explicitly listed, import_dicom won't report an error.

In this mode of operation the session id is taken to be the folder name. However, if subject id is not equal to the session id, the nameformat parameter has to be specified to correctly extract the subject id from the session name. Specifically, nameformat parameter has to specify a regular expression string that returns a 'subject_id' named group. By default, the nameformat parameter is "(?P<subject_id>.*)", which identifies the whole session name as the subject id. Here are a few examples of how to change the nameformat parameter to extract the subject id correctly:

session id

nameformat string

subject id

P1102_000_01

"(?P<subject_id>.*?)_.*"

P1102

S5238_Placebo

"(?P<subject_id>.*?)_.*"

S5238

NDAR_INV2CTC8934

".*?_(?P<subject_id>.*)"

INV2CTC8934

After the sessions are identified and subject id extracted, depending on the check parameter, the user is prompted to confirm processing (check="yes"), the processing continues, but an error is reported if no sessions are identified (check="any"), or the processing continues and no error is reported even if no sessions to be processed are found (check="no").

The folders found are expected to have the data stored in the inbox folder either as individual raw DICOM files—that can be nested in additional subfolders—or as a compressed package(s). If the latter is the case, the files will be extracted to the inbox folder, and the package(s) will submit to the setting in the archive parameter.

If any results—e.g. files in dicom or nii folders—already exists, the processing of the folder will be skipped.

For similar use cases refer to the Examples section.

Processing steps:

import_dicom will first extract and organize the data as described above. As a next step, it will call sort_dicom command to organize the raw DICOM files into separate folders for each images. Next it will call dicom2niix command that will convert the DICOM files to NIfTI format, store them in nii folder and create a session.txt file with details of the session.

Examples

Data from a dedicated inbox folder:

First the examples for processing packages from masterinbox folder.

In the first example, we are assuming that the packages we want to process are in the default folder (<path_to_studyfolder>/sessions/inbox/MR), the file or folder names contain only the packet names to be used, and the subject id is equal to the packet name. All packets found are to be processed, after the user gives a go-ahead to an interactive prompt:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions"

If the processing should continue automatically if packages to process were found, then the command should be:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions" \\
    --check="any"

If only package names starting with 'AP' or 'HQ' are to be processed then the sessions parameter has to be added:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions" \\
    --sessions="AP.*,HQ.*" \\
    --check="any"

If the packages are named e.g. 'Yale-AP4983.zip' with the extension optional, then to extract the packet name and map it directly to subject id, the following pattern parameter needs to be added:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions" \\
    --pattern=".*?-(?P<packet_name>.*?)($|\..*$)" \\
    --sessions="AP.*,HQ.*" \\
    --check="any"

If the session name can also be extracted and the files are in the format e.g. 'Yale-AP4876_Baseline.zip', then a nameformat parameter needs to be added:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions" \\
    --pattern=".*?-(?P<packet_name>.*?)($|\..*$)" \\
    --sessions="AP.*,HQ.*" \\
    --nameformat="(?P<subject_id>.*?)_(?P<session_name>.*)" \\
    --check="any"

In this case, 'AP4876_Baseline' will be first extracted as a packet name and then parsed into 'AP4876' subject id and 'Baseline' session name.

If the files are named e.g. 'Yale-AP4983.zip' and a log file exists in which the AP* or HQ* are mapped to a corresponding subject id and session names, then the command is changed to:

qunex import_dicom \\
    --sessionsfolder="<path_to_studyfolder>/sessions" \\
    --pattern=".*?-(?P<packet_name>.*?)($|\..*$)" \\
    --sessions="AP.*,HQ.*" \\
    --logfile="path:/studies/myStudy/info/scanning_sessions.csv|packet_name:1|subject_id:2|session_name:3" \\
    --check="any"
Data already present:

For the examples of processing data already present in the individual session id folder, let's assume that we have the following files present, with no other files in the sessions folders:

  • /studies/myStudy/sessions/S001_baseline/inbox/AYXQ.tar.gz

  • /studies/myStudy/sessions/S001_incentive/inbox/TWGS.tar.gz

  • /studies/myStudy/sessions/S002_baseline/inbox/OHTZ.zip

  • /studies/myStudy/sessions/S002_incentive/inbox/QRTD.zip

Then these are a set of possible commands:

qunex import_dicom \\
    --sessionsfolder="/studies/myStudy/sessions" \\
    --masterinbox="none" \\
    --sessions="S*"

In the above case all the folders will be processed, the packages will be extracted and (by default) moved to /studies/myStudy/sessions/archive/MR:

qunex import_dicom \\
    --sessionsfolder="/studies/myStudy/sessions" \\
    --masterinbox="none" \\
    --sessions="*baseline" \\
    --archive="delete"

In the above case only the S001_baseline and S002_baseline sessions will be processed and the respective compressed packages will be deleted after the successful processing.