De-identification profile
De-identification profile is a text file with rules definition. One rule is per row. Each rule consists of seven tab-separated columns:
- file format code recognized by MRE Library (e.g. DASTA, DICOM, medicalcxml, xls_glucose__diasend) or Media Types (formerly MIME types) when underlying Apache Tika library is used,
- order number (positive integer), starts at 1 for each file format code,
- addressing or description of attribute, element or tag; it is specific per file format
- de-identification operation type; described below
- the value of the rule; it is a rule specific; e.g. used as a new value for
CHANGE
operation, appending this text value when bothAPPEND_BEFORE
orAPPEND_AFTER
operations are used in rule. - rule name or comment
- miscellaneous values (optional, can be empty); used e.g. as a condition for DICOM with SPECIFIC operation condition:
Supported operations
Encrypt and decrypt content
Operation ENCRYPT
and DECRYPT
enables securing content by encryption with symmetric and asymmetric keys.
Change content
Operation EMPTY
clears existing value. Tag/element/attribute will still exists, but it will be empty.
Operation APPEND_AFTER
appends any text after the existing value.
Operation APPEND_BEFORE
appends any text before the existing value.
Operation CHANGE
replaces the whole text content to a new predefined value from the rule.
Operation SUBSTITUTE
can replace only part of the content by other text.
Operation SUBSTRING
enables the use of part of the content.
Operations LOWERCASE
and UPPERCASE
for making letter upper-case or lower-case.
Identification number
Operation IDENTIFICATION
replaces the old value. It uses identification generator that is based on identificators database and sequence number file.
A new unique identification is generated for each not previously used original identification. Returns anonymous identificator number that is same for a two same original/input values.
Number is generated only the first time when this operation is used twice per a file. You can use is e.g. for Patient ID and than for
Keep the value
Operation KEEP
does nothing until you use strict mode. There will be only attributes, elements, tags and so in the output file that was one of EMPTY, CHANGE, IDENTIFICATION or KEEP operation.
It is working only for a supported file formats when strict mode is enabled.
Remove data
Operation REMOVE
removes attribute, element, tag, etc. If exist it removes child too.
File format specific tasks
Operation SPECIFIC
can be file-format specific.
It can be used even for a configuration purpose of the file format anonymizer. It is a right way you want e.g. removing private tags in a DICOM file.
Execute external application
Operation EXTERNAL
enables use of an external application to file processing. By this way you can do anything you need to do with the file.
None, Nope, Null, Nothing
Operation NONE
does absolutely nothing. :-) It is similar to the KEEP operation, but NONE operation has even no effect in the strict mode.
You can use this operation as a comment when you do not want to delete the rule.
Example Rules
Example rules for file formats implemented in AnonMed.
DASTA
DASTA file format is based on XML, so the DastaAnonymizer extends default XMLAnonymizer. Rules addressing is based on XPath expression.
dasta 1 dasta/garant_dat REMOVE null Remove MD's Identification null dasta 2 dasta/is/ip/@id_pac IDENTIFICATION null Patient's Unique Identificator null dasta 3 dasta/is/ip/a/adr REMOVE null Patient address null dasta 4 dasta/is/ip/a/as REMOVE null Patient contact null dasta 5 dasta/is/ip/prijmeni IDENTIFICATION null Use anonymous ID null dasta 6 dasta/is/ip/prijmeni APPEND_BEFORE CODE_ Append string: CODE_ null
DICOM
DICOM file format meta data tag consists of group and element numbers. For example, Patient ID is (0010,0020) tag where the group is 0010 and element 0020. In profile file you can omit leading zeros.
dicom 1 280301 SPECIFIC null BurnedInAnnotation equal YES dicom 2 80081 EMPTY null ST #26 Address ST dicom 3 80090 REMOVE null PN #30 Department Name dicom 4 81010 KEEP null SH #8 [ct60257] dicom 5 81030 KEEP null Study Description dicom 6 0008103E KEEP null Series Description dicom 7 81050 EMPTY null PN #26 MD's Name PN dicom 8 100020 IDENTIFICATION null Patient ID LO dicom 9 100010 IDENTIFICATION null Patient Name PN dicom 10 100010 APPEND_BEFORE CODE_ Patient Name PN ...
Medicalc XML
medicalcxml 1 data/row/jmeno REMOVE null Patient's first name null medicalcxml 2 data/row/pacient_id IDENTIFICATION null null medicalcxml 3 data/row/poj_datum_do REMOVE null null medicalcxml 4 data/row/poj_datum_od REMOVE null null medicalcxml 5 data/row/pojistovna REMOVE null null medicalcxml 6 data/row/prijmeni IDENTIFICATION null Patient's Last Name null medicalcxml 7 data/row/prijmeni APPEND_BEFORE CODE_ Patient's Last Name null
CSV
csv_glucose__blood_only 1 1,2 EMPTY null null csv_glucose__glucose_and_carbs 1 1,2 EMPTY null null csv_glucose__glucose_only 1 1,2 EMPTY null null csv_glucose__pump_and_sensor_carelink 1 1,2 EMPTY null null csv_glucose__pump_and_sensor_carelink 2 2,2 EMPTY null null csv_glucose__pump_and_sensor_carelink 3 3,2 IDENTIFICATION null null csv_glucose__pump_and_sensor_carelink 4 1,2 IDENTIFICATION null null csv_glucose__pump_and_sensor_carelink 5 1,2 APPEND_BEFORE GL_ null
XLS(X)
xls_glucose__diasend 1 Jméno a glukóza,2,1 IDENTIFICATION null null xls_glucose__diasend 2 Jméno a glukóza,2,2 IDENTIFICATION null null xls_glucose__diasend 3 Jméno a glukóza,2,2 APPEND_BEFORE GL_ null xlsx_glucose__medtronic_diabetes_ipro 1 data_export,2,3 IDENTIFICATION null null xlsx_glucose__medtronic_diabetes_ipro 2 data_export,2,3 APPEND_BEFORE GL_ null