GUID Mint¶
Unique, deterministic study ids, psuedonyms, and pseudodobs for all!
Usage¶
As a Python library:
>>> from diana.utils.guid import GUIDMint
>>> GUIDMint().get_sham_id( name="MERCK^DEREK^L", age=30 )
{
'BirthDate': datetime.date(1988, 11, 20),
'ID': 'VXNQHHN523ZQNJFIY3TXJM4YXABTL6SL',
'Name': ['VANWASSENHOVE', 'XAVIER', 'N'],
'TimeOffset': datetime.timedelta(-47, 82822)
}
From diana-cli
:
$ diana-cli guid "MERCK^DEREK^L" --age 30
Generating GUID
------------------------
WARNING:GUIDMint:Creating non-reproducible GUID using current date
{'birth_date': '19881120',
'id': 'VXNQHHN523ZQNJFIY3TXJM4YXABTL6SL',
'name': 'VANWASSENHOVE^XAVIER^N',
'time_offset': '-47 days, 23:00:22'}
Or from the diana-REST
api:
$ curl -X GET "http://localhost:8080/v1.0/guid?name=MERCK%5EDEREK%5EL&age=30&sex=U"
{
"birth_date": "19881120",
"id": "VXNQHHN523ZQNJFIY3TXJM4YXABTL6SL",
"name": "VANWASSENHOVE^XAVIER^N",
"time_offset": "-47 days, 23:00:22"
}
Algorithm¶
The GUID mint generates a unique and reproducibly generated tag against any consistent set of object-specific variables:
- name (or any string)
- gender ({m, f, u})
- birth date (or age + reference date)
Global Unique ID¶
Generation Algorithm:
- Given
name
,gender
, anddob
parameters. Depending on the available data,name
may be a patient name, an MRN, or a subject ID, or any unique combination of those elements. Ifdob
is unavailable, anage
parameter and areference_date
may be substituted. If no reference date is provided the algorithm defaults to today and the GUID will be unreproducible. - A unique key is generated based on the alphabetically sorted elements
of
name
,dob
, andgender
. - The sha256 hash of the key is computed and the result is encoded into base32
- If the first three characters are not alphabetic, the value is rehashed until it is (for pseudonym generation)
Pseudonym Generation¶
last^first^middle
)`1 <>`__. This
is very useful for alphabetizing subject name lists similarly to their
ID while still allowing for anonymized data sets to be referenced
according to memorable names.Generation Algorithm:
- Given a
guid
andgender
(M,F,U) (optional, defaults to U) - Using the
guid
as a random seed, a gender-appropriate first name and gender-neutral family name is selected from a uniform distribution taken from the US census - The result is returned in DICOM patient name format.
The default name map can be easily replaced to match your fancy
(Shakespearean names, astronauts, children book authors). With slight
modification, a DICOM patient name with up to 5 elements could be
generated (i.e., in last^first^middle^prefix^suffix
format).
Approximate Date-of-Birth¶
As with pseudonyms, it can be useful to maintain a valid date-of-birth (dob) in de-identified metadata. Using a GUID as a seed, any dob can be mapped to a random nearby date for a nearly-age-preserving anonymization strategy. This is useful for keeping an approximate patient age available in a data browser.
Generation Algorithm:
- Given a GUID and a
dob
parameter - Using the
guid
as a random seed, a random integer between -90 and +90 is selected - The original
dob
+ the random delta in days is returned
Study-Time Offset¶
In order to keep study date-times in the correct order, a similar algorithm is used to generate a days and seconds time offset that will keep the study at roughly the same time of day (within an hour) while offseting the study date up to +/-90 days.
Acknowledgements¶
- Inspired in part by the NDAR and FITBIR GUID schema.
- Placeholder names inspired by the Docker names generator