Deidentifying DICOM files#

Warning

This functionality is heavily untested and might not work as it should. For a list of officially supported commands consult List of commands or use qunex --allcommands.

DICOM files often hold information that can be used to identify the person that was scanned. To protect participants' privacy, it is necessary to either remove or change the information stored in DICOM files before sharing the data or storing it in locations that do not ensure the security of the data. Deidentification is most optimally run before any other QuNex processing steps including data onboarding. It can also be run at any later point in time before sharing the DICOM files. QuNex provides two commands that jointly enable de-identification of DICOM files. Specifically, de-identification using QuNex is performed in three steps:

  • examining personally identifiable information in DICOM files,

  • specification of necessary changes to DICOM files,

  • stripping of DICOM file.

Step 1: Examining personally identifiable information in DICOM files#

The first step in de-identification is specification of what information is stored in the DICOM files. To achieve this, QuNex provides get_dicom_fields command. get_dicom_fields command is used to inspect all the DICOM specified files and provide the list of all the DICOM fields with example values.

get_dicom_fields command takes three parameters:

  • folder — the directory containing DICOMs to be processed. If not specified, it is set to the current working directory by default.

  • targetfile — the name (and path) of the file in which to save the information. If not specified, it is set to the default name dicomFields.csv.

  • limit — the maximum number of unique values found in each field across the processed DICOM files that should be printed in the report. If not specified, the default value is 20.

An example use of the command:

qunex get_dicom_fields \
    --folder=/data/studies/WM/sessions/inbox/MR/original \
    --targetfile=/data/studies/WM/sessions/specs/dicomFields.csv \
    --limit=10

This command will inspect the content of the specified folder and all its subfolders, including the zip and tar (or tar.gz and tar.b2z) archives for DICOM files and gzipped DICOM files. It will inspect any DICOM file found and generate a list of all the DICOM fields present in the file along with their values.

The command will generate and save a report in a comma-separated value format in which each line will list the hexadecimal code of the DICOM field, the name of the field and up to limit number of unique values for the field encountered across the inspected files. E.g.:

0x20011023,[Flip Angle Philips],60,90,15,0,8
0x189241,Gradient Echo Train Length,1,0,64,43,165
0x180095,Pixel Bandwidth,0,124,499,2660,134
0x181030,Protocol Name,C-BOLD 3mm 48 2.5s FS-A SENSE,T2w 0.7mm N1 SENSE,BOLD FLANKER 3mm 48 2.5s SENSE,Survey,C-BOLD 3mm 48 2.5s FS-P SENSE
0x81040,Institutional Department Name,Ljubljana
0x100010,Patient's Name,OP267,OP268,OP269,OP270,OP272

The generated report can then be inspected by the user to identify those DICOM fields that potentially hold identifiable information.

Step 2: Specification of necessary changes to DICOM files#

Once the DICOM fields with personally identifiable information are marked, a parameter file, which lists all the changes to be made to the DICOM fields, needs to be generated. In the parameter file, three actions can be specified for each DICOM field:

  • archive ... Archive the original value in the archive file.

  • replace ... Replace the original value with a specified value.

  • delete ... Delete the field from the DICOM file.

For each field multiple actions can be specified. They will be executed in the following order: archive, replace, delete. The actions are to be specified in a regular text file with one DICOM field per line in the following format:

<dicom field>  > <action>[:<parameter>], <action>[:<parameter>]
<dicom field>  > <action>[:<parameter>], <action>[:<parameter>]

The additional parameter has to be specified for the replace action, and specifies the value to be used to replace the original value with. An example content for the specification file is:

0x80005  > delete
0x100010 > delete
0x80012  > delete, archive
0x180032 > replace:20070101

Do note that DICOM fields need to be specified using their hexadecimal code. If a field is listed with multiple hexadecimal codes in the report generated by get_dicom_fields, e.g.:

0x41220/0x80050,Directory Record Sequence/Accession Number,OP267

Only the last code is to be specified. 0x80050 in the example above.

Do make sure that a) all the fields with personally identifiable information are specified to be processed, and b) that fields with values important for further processing (e.g. sequence name or image matrix) are not specified to be changed.

Any lines that start with # or that do not contain > will be ignored.

Step 3: Stripping of DICOM files#

Actual processing of DICOM files is accomplished using change_dicom_files command. The command takes the following parameters:

  • folder — The base folder from which the search for DICOM files should start. The command will try to locate all valid DICOM files within the specified folder and its subfolders. The default is the current working directory.

  • paramfile — The path to the parameter file that specifies what actions to perform on the DICOM fields. By default the command will look for deidparam.txt file in the current working directory.

  • archivefile — The path to the file in which values to be archived are to be stored. By default the values will be archived (appended) to archive.csv file in the current working directory.

  • outputfolder — The optional path to the folder to which the modified DICOM files are to be saved. If not specified, the DICOM files are changed in place (overwritten).

  • extension — An optional extension to be added to each modified DICOM file name. The extension can be applied only when files are copied to the outputfolder.

  • replacementdate — An optional date to replace all instances of StudyDate in the file. If none is provided the dates will be changed to randomly generated ones.

Example#

qunex change_dicom_files \
    --folder=/data/studies/WM/sessions/inbox/MR/original \
    --paramfile=/data/studies/WM/sessions/specs/deidv1.txt \
    --outputfolder=/data/studies/WM/sessions/MR/deid \
    --archivefile=/data/studies/WM/sessions/archive/dicom_archive.csv \
    --extension=deid_v1

The command will search the content of the specified folder, its subfolders, zip and tar packages for presence of DICOM or gzipped DICOM files. It will then process each identified DICOM file according to specifications provided in the parameter file. In addition to the actions specified in the parameter file, the date the DICOM was recorded will be changed. Specifically, a date found in the StudyDate or SeriesDate field will be replaced either by a randomly generated date or by the date provided in the --replacementdate parameter, if specified. In addition, any occurrence of the acquisition date in any of the other fields in the DICOM file will also be replaced by the same randomly generated or specified date. Please note that any other dates (e.g. participant's birth date) are not automatically identified and replaced. These need to be either deleted or replaced explicitly.

If no outputfolder is provided, the DICOM files will be changed in place. That is the original DICOM files, including the content of zip and tar images, will be replaced with DICOM files processed according to specification in the parameter file. If outputfolder is specified, the original DICOM files will be left unchanged and their processed copy will be saved to the specified directory.

When --extension parameter is specified, the DICOM files will be renamed according to the following instructions:

<subject id>-<sequence id>-<sop>.<specified extension>.<original extension>

subject id will be the PatientID or the StudyID after processing according to parameter file. If neither is present, "NA" will be used. sequence id will be taken from the SeriesNumber field. "NA" will be used if SeriesNumber is not present. sop will be taken from the SOPInstanceUID field. If not present, the sequential number of the file will be used. An example generated filename is:

c0K2H8RBG6_mOd_ykrpVbjuR6g1uYcvdl24CeDxH9tg-1201-1.3.46.670589.11.38138.5.20.1.1.6288.1979112509200131460.deid.dcm

The values from fields designated to be archived by the parameter file will be saved to the specified archive file. If an archive file does not yet exist, it will be created. If it does exist, new content will be appended to it. The information in the archive file will be stored in a comma separated value format. Each value to be archived will be stored in a separate line with the following information:

  • original file name,

  • DICOM hexadecimal field code,

  • DICOM field description and value.

Additionally, if the filename was changed during processing, the information on the new file name will also be provided.

Example archive file content#

Dicom/DICOM/IM_0021,0x81090,"(0008, 1090) Manufacturer's Model Name           LO: 'Achieva'"
Dicom/DICOM/IM_0021,0x80050,"(0008, 0050) Accession Number                    SH: 'OP267'"
Dicom/DICOM/IM_0021,0x100010,"(0010, 0010) Patient's Name                      PN: 'OP267'"
Dicom/DICOM/IM_0021,0x400254,"(0040, 0254) Performed Procedure Step Description LO: 'Psy Test'"
Dicom/DICOM/IM_0021,filename,Dicom/DICOM/c0K2H8RBG6_mOd_ykrpVbjuR6g1uYcvdl24CeDxH9tg-1201-1.3.46.670589.11.38138.5.20.1.1.6288.1979112509200131460.deid.dcm

Please do note that storing information in an archive file presents a possible threat to protection of privacy as the information can be used to identify the study participants. It is the user's responsibility to either securely delete or securely store the archive file in a separate location in accordance with relevant regulations.

Usage notice

Please note the following:

  1. Only the fields explicitly set to be removed or replaced will be changed. It is the responsibility of the user to make sure that no DICOM fields with identifiable information are left.

  2. Only valid DICOM fields can be accessed and changed using QuNex. Any vendor specific metadata that is not stored in regular DICOM fields will not be changed. Please make sure that no such information is present in your DICOM files.

  3. Only metadata stored in DICOM fields can be processed using this tool. If any information is "burnt in" into the image data itself, it cannot be found and changed using QuNex. Please make sure that no such information is present in your DICOM files.