# Deidentifying DICOM files ````{warning} This functionality is heavily untested and might not work as it should. For a list of officially supported commands consult [List of commands](../../api/gmri.rst) or use `qunex --allcommands`. ```` DICOM files often hold information that can be used to identify the person that was scanned. To protect participants' privacy, it is necessary to either remove or change the information stored in DICOM files before sharing the data or storing it in locations that do not ensure the security of the data. Deidentification is most optimally run before any other QuNex processing steps including data onboarding. It can also be run at any later point in time before sharing the DICOM files. QuNex provides two commands that jointly enable de-identification of DICOM files. Specifically, de-identification using QuNex is performed in three steps: * examining personally identifiable information in DICOM files, * specification of necessary changes to DICOM files, * stripping of DICOM file. ## Step 1: Examining personally identifiable information in DICOM files The first step in de-identification is specification of what information is stored in the DICOM files. To achieve this, QuNex provides [`get_dicom_fields`](../../api/gmri/get_dicom_fields.rst) command. `get_dicom_fields` command is used to inspect all the DICOM specified files and provide the list of all the DICOM fields with example values. `get_dicom_fields` command takes three parameters: * `folder` — the directory containing DICOMs to be processed. If not specified, it is set to the current working directory by default. * `targetfile` — the name (and path) of the file in which to save the information. If not specified, it is set to the default name `dicomFields.csv`. * `limit` — the maximum number of unique values found in each field across the processed DICOM files that should be printed in the report. If not specified, the default value is `20`. An example use of the command: ``` bash qunex get_dicom_fields \ --folder=/data/studies/WM/sessions/inbox/MR/original \ --targetfile=/data/studies/WM/sessions/specs/dicomFields.csv \ --limit=10 ``` This command will inspect the content of the specified folder and all its subfolders, including the zip and tar (or tar.gz and tar.b2z) archives for DICOM files and gzipped DICOM files. It will inspect any DICOM file found and generate a list of all the DICOM fields present in the file along with their values. The command will generate and save a report in a comma-separated value format in which each line will list the hexadecimal code of the DICOM field, the name of the field and up to `limit` number of unique values for the field encountered across the inspected files. E.g.: ``` 0x20011023,[Flip Angle Philips],60,90,15,0,8 0x189241,Gradient Echo Train Length,1,0,64,43,165 0x180095,Pixel Bandwidth,0,124,499,2660,134 0x181030,Protocol Name,C-BOLD 3mm 48 2.5s FS-A SENSE,T2w 0.7mm N1 SENSE,BOLD FLANKER 3mm 48 2.5s SENSE,Survey,C-BOLD 3mm 48 2.5s FS-P SENSE 0x81040,Institutional Department Name,Ljubljana 0x100010,Patient's Name,OP267,OP268,OP269,OP270,OP272 ``` The generated report can then be inspected by the user to identify those DICOM fields that potentially hold identifiable information. ## Step 2: Specification of necessary changes to DICOM files Once the DICOM fields with personally identifiable information are marked, a parameter file, which lists all the changes to be made to the DICOM fields, needs to be generated. In the parameter file, three actions can be specified for each DICOM field: * `archive` ... Archive the original value in the archive file. * `replace` ... Replace the original value with a specified value. * `delete` ... Delete the field from the DICOM file. For each field multiple actions can be specified. They will be executed in the following order: `archive`, `replace`, `delete`. The actions are to be specified in a regular text file with one DICOM field per line in the following format: ``` > [:], [:] > [:], [:] ``` The additional parameter has to be specified for the `replace` action, and specifies the value to be used to replace the original value with. An example content for the specification file is: ``` 0x80005 > delete 0x100010 > delete 0x80012 > delete, archive 0x180032 > replace:20070101 ``` Do note that DICOM fields need to be specified using their hexadecimal code. If a field is listed with multiple hexadecimal codes in the report generated by `get_dicom_fields`, e.g.: ``` 0x41220/0x80050,Directory Record Sequence/Accession Number,OP267 ``` Only the last code is to be specified. `0x80050` in the example above. Do make sure that a) all the fields with personally identifiable information are specified to be processed, and b) that fields with values important for further processing (e.g. sequence name or image matrix) are not specified to be changed. Any lines that start with `#` or that do not contain `>` will be ignored. ## Step 3: Stripping of DICOM files Actual processing of DICOM files is accomplished using [`change_dicom_files`](../../api/gmri/change_dicom_files.rst) command. The command takes the following parameters: * `folder` — The base folder from which the search for DICOM files should start. The command will try to locate all valid DICOM files within the specified folder and its subfolders. The default is the current working directory. * `paramfile` — The path to the parameter file that specifies what actions to perform on the DICOM fields. By default the command will look for `deidparam.txt` file in the current working directory. * `archivefile` — The path to the file in which values to be archived are to be stored. By default the values will be archived (appended) to `archive.csv` file in the current working directory. * `outputfolder` — The optional path to the folder to which the modified DICOM files are to be saved. If not specified, the DICOM files are changed in place (overwritten). * `extension` — An optional extension to be added to each modified DICOM file name. The extension can be applied only when files are copied to the `outputfolder`. * `replacementdate` — An optional date to replace all instances of StudyDate in the file. If none is provided the dates will be changed to randomly generated ones. ### Example ``` bash qunex change_dicom_files \ --folder=/data/studies/WM/sessions/inbox/MR/original \ --paramfile=/data/studies/WM/sessions/specs/deidv1.txt \ --outputfolder=/data/studies/WM/sessions/MR/deid \ --archivefile=/data/studies/WM/sessions/archive/dicom_archive.csv \ --extension=deid_v1 ``` The command will search the content of the specified folder, its subfolders, zip and tar packages for presence of DICOM or gzipped DICOM files. It will then process each identified DICOM file according to specifications provided in the parameter file. In addition to the actions specified in the parameter file, the date the DICOM was recorded will be changed. Specifically, a date found in the StudyDate or SeriesDate field will be replaced either by a randomly generated date or by the date provided in the `--replacementdate` parameter, if specified. In addition, any occurrence of the acquisition date in any of the other fields in the DICOM file will also be replaced by the same randomly generated or specified date. Please note that any other dates (e.g. participant's birth date) are not automatically identified and replaced. These need to be either deleted or replaced explicitly. If no `outputfolder` is provided, the DICOM files will be changed in place. That is the original DICOM files, including the content of zip and tar images, will be replaced with DICOM files processed according to specification in the parameter file. If `outputfolder` is specified, the original DICOM files will be left unchanged and their processed copy will be saved to the specified directory. When `--extension` parameter is specified, the DICOM files will be renamed according to the following instructions: ``` --.. ``` `subject id` will be the PatientID or the StudyID after processing according to parameter file. If neither is present, "NA" will be used. `sequence id` will be taken from the `SeriesNumber` field. "NA" will be used if `SeriesNumber` is not present. `sop` will be taken from the SOPInstanceUID field. If not present, the sequential number of the file will be used. An example generated filename is: ``` c0K2H8RBG6_mOd_ykrpVbjuR6g1uYcvdl24CeDxH9tg-1201-1.3.46.670589.11.38138.5.20.1.1.6288.1979112509200131460.deid.dcm ``` The values from fields designated to be archived by the parameter file will be saved to the specified archive file. If an archive file does not yet exist, it will be created. If it does exist, new content will be appended to it. The information in the archive file will be stored in a comma separated value format. Each value to be archived will be stored in a separate line with the following information: * original file name, * DICOM hexadecimal field code, * DICOM field description and value. Additionally, if the filename was changed during processing, the information on the new file name will also be provided. ### Example archive file content ``` Dicom/DICOM/IM_0021,0x81090,"(0008, 1090) Manufacturer's Model Name LO: 'Achieva'" Dicom/DICOM/IM_0021,0x80050,"(0008, 0050) Accession Number SH: 'OP267'" Dicom/DICOM/IM_0021,0x100010,"(0010, 0010) Patient's Name PN: 'OP267'" Dicom/DICOM/IM_0021,0x400254,"(0040, 0254) Performed Procedure Step Description LO: 'Psy Test'" Dicom/DICOM/IM_0021,filename,Dicom/DICOM/c0K2H8RBG6_mOd_ykrpVbjuR6g1uYcvdl24CeDxH9tg-1201-1.3.46.670589.11.38138.5.20.1.1.6288.1979112509200131460.deid.dcm ``` Please do note that storing information in an archive file presents a possible threat to protection of privacy as the information can be used to identify the study participants. It is the user's responsibility to either securely delete or securely store the archive file in a separate location in accordance with relevant regulations. ````{admonition} Usage notice Please note the following: 1. Only the fields explicitly set to be removed or replaced will be changed. It is the responsibility of the user to make sure that no DICOM fields with identifiable information are left. 2. Only valid DICOM fields can be accessed and changed using QuNex. Any vendor specific metadata that is not stored in regular DICOM fields will not be changed. Please make sure that no such information is present in your DICOM files. 3. Only metadata stored in DICOM fields can be processed using this tool. If any information is "burnt in" into the image data itself, it cannot be found and changed using QuNex. Please make sure that no such information is present in your DICOM files. ````