chop-dbhi / dicom-anon Goto Github PK
View Code? Open in Web Editor NEWPython DICOM Anonymizer
License: BSD 2-Clause "Simplified" License
Python DICOM Anonymizer
License: BSD 2-Clause "Simplified" License
To be configurable for user-supplied list as per:
parser.add_argument('-x', '--exclude_series_descs', type=str, nargs=1, default=['screen save, basic text sr'],
help='Comma separated list of all series to exclude by their series description name (regardless of case). Defaults to \"screen save, basic text sr\"')
While running a large batch of mammograms today, dicom-anon blew up with the following exception. It appears that the error is in the mammogram file, but I'm wondering if we shouldn't be blowing up when it occurs.
Would the appropriate action in this case be to quarantine the image? I can try to put together a patch if you think this is the appropriate behavior. (If not, what should we be doing in this case?)
Traceback (most recent call last):
File "dicom_anon.py", line 915, in <module>
white_list_file=white_list_file, log_file=options.log_file, rename=options.rename,profile=options.profile, overlay=options.overlay)
File "dicom_anon.py", line 760, in driver
ds = anonymize(ds, white_list, org_root, profile, overlay)
File "dicom_anon.py", line 678, in anonymize
ds.remove_private_tags()
File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/dataset.py", line 483, in remove_private_tags
self.walk(RemoveCallback)
File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/dataset.py", line 595, in walk
callback(self, data_element) # self = this Dataset
File "/usr/local/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/tagtools.py", line 21, in tag_in_exception
raise type(e)(err)
ValueError: Invalid tag (0018, 1152): invalid literal for int() with base 10: '147.3'
Currently silently bails if not able to find the input. Would be helpful to provide a warning or error, or at least informative message.
This block:
if log_file:
h.flush()
h.close()
should probably appear in a finally clause rather than at each return point. Also is that even needed? Seems pretty paranoid and oddly special cased for just the log_file
case.
File "//develop/dicom-anon-master/dicom_anon.py", line 603
except ValueError, e:
^
SyntaxError: invalid syntax
Not Supported in Python 3.7.2
To initially only be user-configurable to a single value for any of the METADATA tags that are defined to a new value of R
in the spec file (e.g., create a new dicom_anon_spec_files/ext_keep_study_and_series_descs_replacePatientNameID.dat
) using a new option of:
parser.add_argument('-f', '--force_replace', type=str, default='REDACTED',
help='Replacement string for any fields with in-house rule R. Defaults to REDACTED')
And to eventually possibly even support a user-supplied comma-delimited set of key/value pairs in the --force_replace
, allowing a different value for each such tag. For now, though, it will be sufficient for this option to support the forcing of values for both the PatientsName and the PatientID (e.g., if so specified in the spec file) to the same user-supplied value.
This:
w = open(white_list_file, 'r')
white_list = w.read()
white_list = json.loads(white_list)
w.close()
white_list = convert_json_white_list(white_list)
should be:
with open(white_list_file, 'r') as w:
white_list = w.read()
white_list = json.loads(white_list)
white_list = convert_json_white_list(white_list)
Line 814 in 2bdc48e
See: https://docs.python.org/2/library/sqlite3.html#using-the-connection-as-a-context-manager
This will clean up a lot of the db.commit()
calls.
... ideally without removing Python 2 support, if possible (and indeed seeming to be possible).
Per https://www.dabsoft.ch/dicom/3/C.7.6.3.1.2/:
PALETTE COLOR = Pixel data describe a color image with a single sample per pixel (single image plane). The pixel value is used as an index into each of the Red, Blue, and Green Palette Color Lookup Tables (0028,1101-1103&1201-1203). This value may be used only when Samples per Pixel (0028,0002) has a value of 1. When the Photometric Interpretation is Palette Color; Red, Blue, and Green Palette Color Lookup Tables shall be present.
These values must be preserved:
prior_cleaned
is obtained via:
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
)
and is used to retain the same anonymized values (e.g., Accession Number 7
, etc.).
However, any newly generated clean values, as from:
cleaned = str('%s %d' % (e.name, self.audit.get_next_pk(e)))
need a way to clear out the DB of old values from previous incarnations of that line, such as those from the current/soon-to-be-previous:
cleaned = ('%s %d' % (e.name, self.audit.get_next_pk(e))).encode('ascii')
There is a a vector of string value for each entry in ANNEX_E
.
PhotometricInterpretation
is stored as a CS which means that it gets removed along the lines of #3 (comment)
Thanks for providing this useful program! I ran into a small problem while setting it up, and I wanted to bring it to your attention. It seems like the pydicom
project has changed the way it is intended to be imported.
git clone https://github.com/chop-dbhi/dicom-anon.git
cd dicom-anon.git
pip2 install -r requirements.txt
#=> Successfully installed pydicom-1.0.2
Running python2 dicom_anon.py
gives this error message:
Traceback (most recent call last):
File "dicom_anon.py", line 23, in <module>
import dicom
File "/usr/local/lib/python2.7/site-packages/dicom.py", line 11, in <module>
raise ImportError(msg)
ImportError:
Pydicom via 'import dicom' has been removed in pydicom version 1.0.
Please install the `dicom` package to restore function of code relying
on pydicom 0.9.9 or earlier. E.g. `pip install dicom`.
Alternatively, most code can easily be converted to pydicom > 1.0 by
changing import lines from 'import dicom' to 'import pydicom'.
See the Transition Guide at
https://pydicom.github.io/pydicom/stable/transition_to_pydicom1.html.
From reading the pydicom documentation, I think this can be fixed by depending on the dicom
library instead of the pydicom
library.
A patch like this seems to work:
diff --git a/requirements.txt b/requirements.txt
index a681f02..8314cfa 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,4 +4,4 @@
#
# pip install -U -r requirements.txt
-pydicom
+dicom==0.9.9.post1
I'm happy to submit a pull request with this patch. Thanks again!
As a part of work towards #27 ("Add support for Python 3"), noticed that when a tag value contains a single-quote apostrophe, and when def()
attempts to save the data via self.cursor.execute(GET_LINKED % table_name, (original, study_uid_pk))
, that sqlite3 complains of sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
For example, with the ReferringPhysiciansName
tag (0008,0090) containing a value of Referring Physician's Name 4
, the following traceback is shown:
Traceback (most recent call last):
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
callback(self, data_element) # self = this Dataset
File "dicom_anon.py", line 459, in clean_cb
if self.enforce_profile(ds, e, study_pk):
File "dicom_anon.py", line 486, in enforce_profile
cleaned = self.basic(ds, e, study_pk)
File "dicom_anon.py", line 507, in basic
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
File "dicom_anon.py", line 194, in get
self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dicom_anon.py", line 812, in <module>
da.run(i_dir, c_dir)
File "dicom_anon.py", line 700, in run
ds, study_pk = self.anonymize(ds)
File "dicom_anon.py", line 637, in anonymize
ds.walk(partial(self.clean_cb, study_pk=study_pk))
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1779, in walk
dataset.walk(callback)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 37, in tag_in_exception
raise type(ex)(msg)
sqlite3.InterfaceError: With tag (0008, 0090) got exception: Error binding parameter 0 - probably unsupported type.
Traceback (most recent call last):
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
callback(self, data_element) # self = this Dataset
File "dicom_anon.py", line 459, in clean_cb
if self.enforce_profile(ds, e, study_pk):
File "dicom_anon.py", line 486, in enforce_profile
cleaned = self.basic(ds, e, study_pk)
File "dicom_anon.py", line 507, in basic
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
File "dicom_anon.py", line 194, in get
self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
Please note that this apostrophe issue does not seem to occur when run within a Python 2 environment, and is only reveled while running in a Python 3 environment, for whatever reason. This could be related to the particular import sqlite3
being utilized, or to the psycopg2-binary v2.83 dependency that is used under Python 3 but seemingly not used at all under Python 2.
An initial workaround to remove any potentially offending single-quote apostrophes can be as simple as the following:
safer_cleaned_str = "{0}".format(cleaned).replace("'","")
and:
safer_original_str = "{0}".format(original).replace("'","")
...wherever these are used in the relevant db.execute(...)
or self.cursor.execute(...)
calls to sqlite3.
Eventually, however, a full sqlite3-compatible escaping of these single-quote apostrophes would be even better, but first pass attempts at doing so still encountered the above error, so just removing them entirely for now.
Add option of:
parser.add_argument('-d', '--do_not_clean', action='store_true', default=False, help='Do NOT apply general tag value cleaning;'
'instead only do any specified --force_replace & --exclude_series_descs')
to bypass the normal DICOM tag value cleaning de-identification operations for which dicom_anon
was primarily written.
To be used when --force_replace
and --exclude_series_descs
are needed, but the de-identification is being performed elsewhere (e.g., in GCP's Healthcare API) and any other pre-de-identification is for whatever reason undesired. Allows the dicom_anon
framework to still be fully leveraged for these two new --force_replace
and --exclude_series_descs
options without creating an entirely new tool.
for example:
CREATE_NON_LINKED_TABLE = 'CREATE TABLE %s (id INTEGER PRIMARY KEY AUTOINCREMENT, original, cleaned, UNIQUE(original))'
CREATE_LINKED_TABLE = 'CREATE TABLE %s (id INTEGER PRIMARY KEY AUTOINCREMENT, original, cleaned, study INTEGER, UNIQUE(original, study), FOREIGN KEY(study) REFERENCES studyinstanceuid(id))'
and then:
def audit_save(tag, cleaned, study_uid_pk=None):
...
try:
if tag.name.lower() == 'study instance uid':
db.execute(INSERT_OTHER % table_name(tag), (original, cleaned))
return cleaned
else:
db.execute(INSERT_LINKED % table_name(tag), (original, cleaned, study_uid_pk))
return cleaned
except sqlite3.IntegrityError:
return audit_get(tag=tag, study_uid_pk=study_uid_pk)
Looking here why does an IOError
stop execution whereas an unhanded Exception
is continue
d?
This check can be eliminated:
Line 815 in 2bdc48e
(0008, 0008) Image Type
File "dicom_anon.py", line 552, in anonymize
ds.file_meta[MEDIA_STORAGE_SOP_INSTANCE_UID].value = ds[SOP_INSTANCE_UID].value
File "/home/anaconda3/envs/rad/lib/python3.6/site-packages/pydicom/dataset.py", line 582, in __getitem__
data_elem = dict.__getitem__(self, tag)
KeyError: (0008, 0018)
It looks like SOP_INSTANCE_UID is removed along with other required tags from the walking. Is this script still working for others?
Allow plugging in a pixel anonymizer that blacks our the burned in annotations.
Ideally it would plug in here and look something like: https://github.com/johnperry/CTP/blob/master/source/files/scripts/DicomPixelAnonymizer.script
As just discussed with @jeffmax and @alexfelmeister, UUIDs are currently being created based on date&time of dicom_anon runtime. Instead, consider creating the UUID based on a random number generation.
After running set "Patient Identity Removed" (0012,0062) = 'YES'
and "De-identification Method" (0012,0063) to something like "dicom-anon"
See: http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.html
The code requires the sqlite db to properly consistently replace values, but an in-memory database could be used when the user does not want to record this data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.