# Mapping processed data in and out of QuNex ## Inter-operable data mapping framework The goal of QuNex is to maintain a flexible and inter-operable environment for agile support of inputs and outputs across various pipelines (e.g. HCP Pipelines vs. fMRI Prep). To achieve this goal QuNex allows for a flexible, robust and extensible mapping of processed data into the QuNex hierarchy or mapping data out. For instance, datasets that were processed using QuNex may need to be exported to a different folder structure that may be expected by another workflow. Alternatively, a user may want to import the data processed by another workflow into the QuNex folder hierarchy for additional processing or analytics. To map data into the QuNex structure use specific import commands (e.g. [`import_hcp`](../../api/gmri/import_hcp.rst) for HCP style data). To map it out of the QuNex folder structure use specific export commands (e.g. `export_hcp` for HCP style data). ## The export commands The export command are designed to support various data mappings out of the QuNex folder hierarchy. Currently only HCP style data is supported, we intend to implement support for additional data hierarchies in the future. * The export command first prepares the mapping. * Next the command checks that the mapping can be conducted as specified by the given parameters. * If this validation identifies any issues, no mapping is conducted in order to avoid an incomplete mapping. * This validation only checks for the presence of relevant source and target files. It does not check if the user has the required file-system permissions to execute the actions. ## The export_hcp command The `export_hcp` command is designed to support mappings of HCP style data out of the QuNex folder hierarchy. The data preprocessed using the HCP Pipelines is mapped to the provided target location. The mapping assumes that hcpls folder structure was used for the image processing. It will map all the folders in the session's hcp directory. Files not to be mapped can be explicitly excluded using the regular expressions listed in the `exclude` parameter. ## Arguments The command facilitates the following arguments: * `--sessionsfolder` Specifies the base study sessions folder within the QuNex folder structure to or from which the data are to be mapped. If not specified explicitly, the current working folder will be taken as the location of the sessionsfolder. [.] * `--batchfile` A path to a `batch.txt` file with information on the sessions. [\*] * `--filter` An optional parameter used in combination with a `batch.txt` file used to filter sessions to include in the mapping. It is specified as a string in the format: `":|:"`, where the keys and values refer to information provided by the `batch.txt` file referenced in the `batchfile` parameter. Only the sessions for which all the specified keys match the specified values will be mapped. * `--sessions` An optional parameter explicitly specifying, which of the sessions identified by the `sessions` parameter are to be mapped. If not specified, all sessions will be mapped. * `--mapaction` How to map the data. The following actions are supported: a) 'copy' - the data is copied from source to target, b) 'link' - if possible, hard links are created for the c) 'move' - the data is moved from source to target location. ['link'] * `--mapto` The external target of the mapping when starting with the QuNex. * `--overwrite` Whether existing files at the target location should be overwritten. Possible options are: a) yes - any existing files should be replaced b) no - no existing files should be replaced and the mapping should be aborted if any are found c) skip - skip files that already exist, process others * `--exclude` A comma separated list of regular expression patterns that specify which files should be excluded from mapping. The regular expression patterns are matched against the full path of the source files. * `--verbose` Report details while running function ## Examples ### export_hcp We will assume the following: * data to be mapped is located in the folder `/data/studies/myStudy/sessions` * a batch file exists in the location `/data/studies/myStudy/processing/batch.txt` * we would like to map the data to location `/data/outbox/hcp_formatted/myStudy` Given the above assumptions the following examples can be run: ```sh qunex export_hcp \ --sessionsfolder="/data/studies/myStudy/sessions" \ --batchfile="/data/studies/myStudy/processing/batch.txt" \ --mapto="/data/outbox/hcp_formatted/myStudy" \ --mapaction="link" \ --overwrite="skip" ``` Using the above command the data found in the `/data/studies/myStudy/sessions//hcp/` folders would be mapped to the `/data/outbox/hcp_formatted/myStudy/` folder for all the sessions listed in the `batch.txt` file. Specifically, folders would be recreated as needed and hard-links would be created for all the files to be mapped. If any target files already exist, they would be skipped, but the processing of other files would take place anyway. ```sh qunex export_hcp \ --sessionsfolder="/data/studies/myStudy/sessions" \ --batchfile="/data/studies/myStudy/processing/batch.txt" \ --mapto="/data/outbox/hcp_formatted/myStudy" \ --filter="group:controls|institution:Yale" \ --mapaction="copy" \ --overwrite="no" ``` Using the above command, only data from the sessions that are marked in the `batch.txt` file to be from the control group and acquired at Yale would be mapped. In this case, the files would be copied and if any files would already exist in the target location, the mapping would be aborted altogether. ```sh qunex export_hcp \ --sessionsfolder="/data/studies/myStudy/sessions" \ --batchfile="/data/studies/myStudy/processing/batch.txt" \ --mapto="/data/outbox/hcp_formatted/myStudy" \ --sessions="AP*,HQ*" \ --mapaction="move" \ --overwrite="yes" ``` Using the above command, only the sessions that start with either "AP" or "HQ" would be mapped, the files would be moved and any existing files at the target location would be overwritten. ```sh qunex export_hcp \ --sessionsfolder="/data/studies/myStudy/sessions" \ --batchfile="/data/studies/myStudy/processing/batch.txt" \ --mapto="/data/outbox/hcp_formatted/myStudy" \ --mapaction="link" \ --exclude="unprocessed,MotionMatrices,MotionCorrection" \ --overwrite="skip" ``` Using the above command, all the sessions specified in the `batch.txt` would be processed, files would be linked, files that already exist would be skipped, and any files for which the path include `unprocessed`, 'MotionMatrices' or 'MotionCorrection' would be excluded from the mapping.