De-identification profile

De-identification profile is a text file with rules definition. One rule is per row. Each rule consists of seven tab-separated columns:

  1. file format code recognized by MRE Library (e.g. DASTA, DICOM, medicalcxml, xls_glucose__diasend) or Media Types (formerly MIME types) when underlying Apache Tika library is used,
  2. order number (positive integer), starts at 1 for each file format code,
  3. addressing or description of attribute, element or tag; it is specific per file format
  4. de-identification operation type; described below
  5. the value of the rule; it is a rule specific; e.g. used as a new value for CHANGE operation, appending this text value when both APPEND_BEFORE or APPEND_AFTER operations are used in rule.
  6. rule name or comment
  7. miscellaneous values (optional, can be empty); used e.g. as a condition for DICOM with SPECIFIC operation condition:

Supported operations

Encrypt and decrypt content

Operation ENCRYPT and DECRYPT enables securing content by encryption with symmetric and asymmetric keys.

Change content

Operation EMPTY clears existing value. Tag/element/attribute will still exists, but it will be empty.

Operation APPEND_AFTER appends any text after the existing value.

Operation APPEND_BEFORE appends any text before the existing value.

Operation CHANGE replaces the whole text content to a new predefined value from the rule.

Operation SUBSTITUTE can replace only part of the content by other text.

Operation SUBSTRING enables the use of part of the content.

Operations LOWERCASE and UPPERCASE for making letter upper-case or lower-case.

Identification number

Operation IDENTIFICATION replaces the old value. It uses identification generator that is based on identificators database and sequence number file.

A new unique identification is generated for each not previously used original identification. Returns anonymous identificator number that is same for a two same original/input values.

Number is generated only the first time when this operation is used twice per a file. You can use is e.g. for Patient ID and than for

Keep the value

Operation KEEP does nothing until you use strict mode. There will be only attributes, elements, tags and so in the output file that was one of EMPTY, CHANGE, IDENTIFICATION or KEEP operation.

It is working only for a supported file formats when strict mode is enabled.

Remove data

Operation REMOVE removes attribute, element, tag, etc. If exist it removes child too.

File format specific tasks

Operation SPECIFIC can be file-format specific.

It can be used even for a configuration purpose of the file format anonymizer. It is a right way you want e.g. removing private tags in a DICOM file.

Execute external application

Operation EXTERNAL enables use of an external application to file processing. By this way you can do anything you need to do with the file.

None, Nope, Null, Nothing

Operation NONE does absolutely nothing. :-) It is similar to the KEEP operation, but NONE operation has even no effect in the strict mode.

You can use this operation as a comment when you do not want to delete the rule.

Example Rules

Example rules for file formats implemented in AnonMed.

DASTA

DASTA file format is based on XML, so the DastaAnonymizer extends default XMLAnonymizer. Rules addressing is based on XPath expression.

dasta   1       dasta/garant_dat        REMOVE  null    Remove MD's Identification      null
dasta   2       dasta/is/ip/@id_pac     IDENTIFICATION  null    Patient's Unique Identificator  null
dasta   3       dasta/is/ip/a/adr       REMOVE  null    Patient address null
dasta   4       dasta/is/ip/a/as        REMOVE  null    Patient contact null
dasta   5       dasta/is/ip/prijmeni    IDENTIFICATION  null    Use anonymous ID        null
dasta   6       dasta/is/ip/prijmeni    APPEND_BEFORE   CODE_   Append string: CODE_    null

DICOM

DICOM file format meta data tag consists of group and element numbers. For example, Patient ID is (0010,0020) tag where the group is 0010 and element 0020. In profile file you can omit leading zeros.

dicom   1       280301  SPECIFIC        null    BurnedInAnnotation      equal YES
dicom   2       80081   EMPTY   null    ST #26 Address  ST
dicom   3       80090   REMOVE  null    PN #30 Department Name
dicom   4       81010   KEEP    null    SH #8 [ct60257]
dicom   5       81030   KEEP    null    Study Description
dicom   6       0008103E        KEEP    null    Series Description
dicom   7       81050   EMPTY   null    PN #26 MD's Name        PN
dicom   8       100020  IDENTIFICATION  null    Patient ID      LO
dicom   9       100010  IDENTIFICATION  null    Patient Name    PN
dicom   10      100010  APPEND_BEFORE   CODE_   Patient Name    PN
...

Medicalc XML

medicalcxml     1       data/row/jmeno  REMOVE  null    Patient's first name    null
medicalcxml     2       data/row/pacient_id     IDENTIFICATION  null            null
medicalcxml     3       data/row/poj_datum_do   REMOVE  null            null
medicalcxml     4       data/row/poj_datum_od   REMOVE  null            null
medicalcxml     5       data/row/pojistovna     REMOVE  null            null
medicalcxml     6       data/row/prijmeni       IDENTIFICATION  null    Patient's Last Name     null
medicalcxml     7       data/row/prijmeni       APPEND_BEFORE   CODE_   Patient's Last Name     null

CSV

csv_glucose__blood_only 1       1,2     EMPTY   null            null
csv_glucose__glucose_and_carbs  1       1,2     EMPTY   null            null
csv_glucose__glucose_only       1       1,2     EMPTY   null            null
csv_glucose__pump_and_sensor_carelink   1       1,2     EMPTY   null            null
csv_glucose__pump_and_sensor_carelink   2       2,2     EMPTY   null            null
csv_glucose__pump_and_sensor_carelink   3       3,2     IDENTIFICATION  null            null
csv_glucose__pump_and_sensor_carelink   4       1,2     IDENTIFICATION  null            null
csv_glucose__pump_and_sensor_carelink   5       1,2     APPEND_BEFORE   GL_             null

XLS(X)

xls_glucose__diasend    1       Jméno a glukóza,2,1     IDENTIFICATION  null            null
xls_glucose__diasend    2       Jméno a glukóza,2,2     IDENTIFICATION  null            null
xls_glucose__diasend    3       Jméno a glukóza,2,2     APPEND_BEFORE   GL_             null
xlsx_glucose__medtronic_diabetes_ipro   1       data_export,2,3 IDENTIFICATION  null    null
xlsx_glucose__medtronic_diabetes_ipro   2       data_export,2,3 APPEND_BEFORE   GL_     null