Coder Social home page Coder Social logo

dicom-anon's People

Contributors

blakedewey avatar jeffmax avatar sgithens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dicom-anon's Issues

Add --exclude_series_descs option to omit "Screen Save", "Basic Text SR", etc

To be configurable for user-supplied list as per:

    parser.add_argument('-x', '--exclude_series_descs', type=str, nargs=1, default=['screen save, basic text sr'],
                        help='Comma separated list of all series to exclude by their series description name (regardless of case). Defaults to \"screen save, basic text sr\"')

Uncaught Exception for bad tag in dicom file.

While running a large batch of mammograms today, dicom-anon blew up with the following exception. It appears that the error is in the mammogram file, but I'm wondering if we shouldn't be blowing up when it occurs.

Would the appropriate action in this case be to quarantine the image? I can try to put together a patch if you think this is the appropriate behavior. (If not, what should we be doing in this case?)

Traceback (most recent call last):
  File "dicom_anon.py", line 915, in <module>
    white_list_file=white_list_file, log_file=options.log_file, rename=options.rename,profile=options.profile, overlay=options.overlay)
  File "dicom_anon.py", line 760, in driver
    ds = anonymize(ds, white_list, org_root, profile, overlay)
  File "dicom_anon.py", line 678, in anonymize
    ds.remove_private_tags()
  File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/dataset.py", line 483, in remove_private_tags
    self.walk(RemoveCallback)
  File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/dataset.py", line 595, in walk
    callback(self, data_element)  # self = this Dataset
  File "/usr/local/lib/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/sgithens/Envs/opal/lib/python2.7/site-packages/dicom/tagtools.py", line 21, in tag_in_exception
    raise type(e)(err)
ValueError: Invalid tag (0018, 1152): invalid literal for int() with base 10: '147.3'

Log Flushing

This block:

                 if log_file:
                     h.flush()
                     h.close()

should probably appear in a finally clause rather than at each return point. Also is that even needed? Seems pretty paranoid and oddly special cased for just the log_file case.

Add --force_replace option and support for a new R action in the spec file

To initially only be user-configurable to a single value for any of the METADATA tags that are defined to a new value of R in the spec file (e.g., create a new dicom_anon_spec_files/ext_keep_study_and_series_descs_replacePatientNameID.dat) using a new option of:

    parser.add_argument('-f', '--force_replace', type=str, default='REDACTED',
                        help='Replacement string for any fields with in-house rule R. Defaults to REDACTED')

And to eventually possibly even support a user-supplied comma-delimited set of key/value pairs in the --force_replace, allowing a different value for each such tag. For now, though, it will be sufficient for this option to support the forcing of values for both the PatientsName and the PatientID (e.g., if so specified in the spec file) to the same user-supplied value.

Use Context Manager for White List File

This:

            w = open(white_list_file, 'r')
            white_list = w.read()
            white_list = json.loads(white_list)
            w.close()
            white_list = convert_json_white_list(white_list)

should be:

            with open(white_list_file, 'r') as w:
              white_list = w.read()
              white_list = json.loads(white_list)
            white_list = convert_json_white_list(white_list)

Add support for Python 3

... ideally without removing Python 2 support, if possible (and indeed seeming to be possible).

Photometric Interpretation = Palette Color Scrubs the Look up Tables

Per https://www.dabsoft.ch/dicom/3/C.7.6.3.1.2/:

PALETTE COLOR = Pixel data describe a color image with a single sample per pixel (single image plane). The pixel value is used as an index into each of the Red, Blue, and Green Palette Color Lookup Tables (0028,1101-1103&1201-1203). This value may be used only when Samples per Pixel (0028,0002) has a value of 1. When the Photometric Interpretation is Palette Color; Red, Blue, and Green Palette Color Lookup Tables shall be present.

These values must be preserved:

  • (0028,1101-1103&1201-1203)
  • Samples per Pixel (0028,0002)

Add --DB_delete option to clear out the sqlite3 tables of prior_cleaned

prior_cleaned is obtained via:
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk))
and is used to retain the same anonymized values (e.g., Accession Number 7, etc.).

However, any newly generated clean values, as from:
cleaned = str('%s %d' % (e.name, self.audit.get_next_pk(e)))
need a way to clear out the DB of old values from previous incarnations of that line, such as those from the current/soon-to-be-previous:
cleaned = ('%s %d' % (e.name, self.audit.get_next_pk(e))).encode('ascii')

Depend on `dicom` instead of `pydicom`

Thanks for providing this useful program! I ran into a small problem while setting it up, and I wanted to bring it to your attention. It seems like the pydicom project has changed the way it is intended to be imported.

git clone https://github.com/chop-dbhi/dicom-anon.git
cd dicom-anon.git
pip2 install -r requirements.txt
#=> Successfully installed pydicom-1.0.2

Running python2 dicom_anon.py gives this error message:

Traceback (most recent call last):
  File "dicom_anon.py", line 23, in <module>
    import dicom
  File "/usr/local/lib/python2.7/site-packages/dicom.py", line 11, in <module>
    raise ImportError(msg)
ImportError: 
Pydicom via 'import dicom' has been removed in pydicom version 1.0.
Please install the `dicom` package to restore function of code relying
on pydicom 0.9.9 or earlier. E.g. `pip install dicom`.
Alternatively, most code can easily be converted to pydicom > 1.0 by
changing import lines from 'import dicom' to 'import pydicom'.
See the Transition Guide at
https://pydicom.github.io/pydicom/stable/transition_to_pydicom1.html.

From reading the pydicom documentation, I think this can be fixed by depending on the dicom library instead of the pydicom library.

A patch like this seems to work:

diff --git a/requirements.txt b/requirements.txt
index a681f02..8314cfa 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,4 +4,4 @@
 #
 #     pip install -U -r requirements.txt
 
-pydicom
+dicom==0.9.9.post1

I'm happy to submit a pull request with this patch. Thanks again!

Add Python 3 support for apostrophes in tag values

As a part of work towards #27 ("Add support for Python 3"), noticed that when a tag value contains a single-quote apostrophe, and when def() attempts to save the data via self.cursor.execute(GET_LINKED % table_name, (original, study_uid_pk)), that sqlite3 complains of sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

For example, with the ReferringPhysiciansName tag (0008,0090) containing a value of Referring Physician's Name 4, the following traceback is shown:

Traceback (most recent call last):
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
    yield
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
    callback(self, data_element)  # self = this Dataset
  File "dicom_anon.py", line 459, in clean_cb
    if self.enforce_profile(ds, e, study_pk):
  File "dicom_anon.py", line 486, in enforce_profile
    cleaned = self.basic(ds, e, study_pk)
  File "dicom_anon.py", line 507, in basic
    prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
  File "dicom_anon.py", line 194, in get
    self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dicom_anon.py", line 812, in <module>
    da.run(i_dir, c_dir)
  File "dicom_anon.py", line 700, in run
    ds, study_pk = self.anonymize(ds)
  File "dicom_anon.py", line 637, in anonymize
    ds.walk(partial(self.clean_cb, study_pk=study_pk))
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1779, in walk
    dataset.walk(callback)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 37, in tag_in_exception
    raise type(ex)(msg)
sqlite3.InterfaceError: With tag (0008, 0090) got exception: Error binding parameter 0 - probably unsupported type.
Traceback (most recent call last):
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
    yield
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
    callback(self, data_element)  # self = this Dataset
  File "dicom_anon.py", line 459, in clean_cb
    if self.enforce_profile(ds, e, study_pk):
  File "dicom_anon.py", line 486, in enforce_profile
    cleaned = self.basic(ds, e, study_pk)
  File "dicom_anon.py", line 507, in basic
    prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
  File "dicom_anon.py", line 194, in get
    self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

Please note that this apostrophe issue does not seem to occur when run within a Python 2 environment, and is only reveled while running in a Python 3 environment, for whatever reason. This could be related to the particular import sqlite3 being utilized, or to the psycopg2-binary v2.83 dependency that is used under Python 3 but seemingly not used at all under Python 2.

An initial workaround to remove any potentially offending single-quote apostrophes can be as simple as the following:

        safer_cleaned_str =  "{0}".format(cleaned).replace("'","")

and:

        safer_original_str =  "{0}".format(original).replace("'","")

...wherever these are used in the relevant db.execute(...) or self.cursor.execute(...) calls to sqlite3.

Eventually, however, a full sqlite3-compatible escaping of these single-quote apostrophes would be even better, but first pass attempts at doing so still encountered the above error, so just removing them entirely for now.

Add --do_not_clean option to ONLY do the --force_replace and --exclude_series_descs actions

Add option of:

    parser.add_argument('-d', '--do_not_clean', action='store_true', default=False, help='Do NOT apply general tag value cleaning;'
                                                                                   'instead only do any specified --force_replace & --exclude_series_descs')

to bypass the normal DICOM tag value cleaning de-identification operations for which dicom_anon was primarily written.

To be used when --force_replace and --exclude_series_descs are needed, but the de-identification is being performed elsewhere (e.g., in GCP's Healthcare API) and any other pre-de-identification is for whatever reason undesired. Allows the dicom_anon framework to still be fully leveraged for these two new --force_replace and --exclude_series_descs options without creating an entirely new tool.

Add unique constraints to audit tables for extra safety

CREATE_NON_LINKED_TABLE = 'CREATE TABLE %s (id INTEGER PRIMARY KEY AUTOINCREMENT, original, cleaned, UNIQUE(original))'
CREATE_LINKED_TABLE = 'CREATE TABLE %s (id INTEGER PRIMARY KEY AUTOINCREMENT, original, cleaned, study INTEGER, UNIQUE(original, study), FOREIGN KEY(study) REFERENCES studyinstanceuid(id))'

and then:

def audit_save(tag, cleaned, study_uid_pk=None):
...
    try:
        if tag.name.lower() == 'study instance uid':
            db.execute(INSERT_OTHER % table_name(tag), (original, cleaned))
            return cleaned
        else:
            db.execute(INSERT_LINKED % table_name(tag), (original, cleaned, study_uid_pk))
            return cleaned
    except sqlite3.IntegrityError:
        return audit_get(tag=tag, study_uid_pk=study_uid_pk)

Dictionary Key Error, also generated file lacks required valid dicom headers.

File "dicom_anon.py", line 552, in anonymize
  ds.file_meta[MEDIA_STORAGE_SOP_INSTANCE_UID].value = ds[SOP_INSTANCE_UID].value
File "/home/anaconda3/envs/rad/lib/python3.6/site-packages/pydicom/dataset.py", line 582, in __getitem__
  data_elem = dict.__getitem__(self, tag)
KeyError: (0008, 0018)

It looks like SOP_INSTANCE_UID is removed along with other required tags from the walking. Is this script still working for others?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.