Coder Social home page Coder Social logo

dicom-anonymizer's People

Contributors

dmd avatar finetjul avatar laurennlam avatar ludvigolsen avatar mkzia avatar mntikor avatar pchoisel avatar sanjaymjoshi avatar sarthakpati avatar scebbers avatar sharayujosh avatar smasuda avatar smjoshiatglobus avatar sumedhajoshi avatar timcogan avatar ue-sho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dicom-anonymizer's Issues

Support for additional Application Level Confidentiality profiles

Part 16 CID 7050 lists various De-identification Methods:
Screen Shot 2021-11-09 at 12 14 56 PM

Apart from the methods that impact pixel data cleaning, the rest of the methods are documented in the Application Level Confidentiality Profile Attributes (Part 15 Table E.1-1):
Screen Shot 2021-11-09 at 12 17 19 PM

Feature request:

  1. dicom-anonymizer could allow entering a list of the methods/attributes to use as presets and override the basic profile
  2. The following tags would also be updated accordingly:

Tag replacement fails during anonymization for structured reports

Found while doing unit testing test-SR.dcm from pydicom's test files:

dicomanonymizer\simpledicomanonymizer.py:440: in anonymize_dataset
    action(dataset, tag)
dicomanonymizer\simpledicomanonymizer.py:134: in replace
    replace_element(element)
dicomanonymizer\simpledicomanonymizer.py:122: in replace_element
    replace_element(sub_element)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  

element = RawDataElement(tag=(0040, a010), VR='CS', length=16, value=b'HAS OBS CONTEXT ', value_tell=16, is_implicit_VR=False, is_little_endian=True, is_raw=True)

    def replace_element(element):
        """
        Replace element's value according to it's VR:
        - LO, LT, SH, PN, CS, ST, UT: replace with 'Anonymized'
        - UI: cf replace_element_UID
        - DS and IS: value will be replaced by '0'
        - FD, FL, SS, US, SL, UL: value will be replaced by 0
        - DA: value will be replaced by '00010101'
        - DT: value will be replaced by '00010101010101.000000+0000'
        - TM: value will be replaced by '000000.00'
        - UN: value will be replaced by b'Anonymized' (binary string)
        - SQ: call replace_element for all sub elements

        See https://laurelbridge.com/pdf/Dicom-Anonymization-Conformance-Statement.pdf
        """
        if element.VR in ('LO', 'LT', 'SH', 'PN', 'CS', 'ST', 'UT'):
>           element.value = 'Anonymized'
E           AttributeError: can't set attribute

dicomanonymizer\simpledicomanonymizer.py:108: AttributeError

Anonymization of (6000, 3000) Overlay Data

Dear all,

Here is my DICOM file:
https://sourceforge.net/p/gdcm/gdcmdata/ci/2bddc5695f2482ee3f4d92db7de2348b816fe64c/tree/MR-SIEMENS-DICOM-WithOverlays.dcm

I run:
dicom-anonymizer MR-SIEMENS-DICOM-WithOverlays.dcm output.dcm

I got these error messages:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 763, in get
key = Tag(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 79, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2216, in walk
callback(self, data_element) # self = this Dataset
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 765, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/dicom-anonymizer", line 8, in
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 295, in anonymize_dicom_file
anonymize_dataset(dataset, extra_anonymization_rules, delete_private_tags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 393, in anonymize_dataset
dataset.walk(range_callback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2222, in walk
dataset.walk(callback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 32, in tag_in_exception
raise type(exc)(msg) from exc
TypeError: With tag (6000, 3000) got exception: Dataset.get key must be a string or tag
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 763, in get
key = Tag(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 79, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2216, in walk
callback(self, data_element) # self = this Dataset
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 765, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag

Plese help, thank you.

Performance problems with newer MR software version (Siemens)

I noticed that the anonymizer, when used via Python for a newer MR software version with modified DICOM header (modified with respect to the older versions), is significantly slower than for older MR software versions. I have attached an excerpt from the terminal:

100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 47.61it/s]
2023-08-23 09:00:00 dcm_anon     INFO     File 1.3.12.2.1107.5.2.36.40414.201712181.dcm with software version syngo MR B19.

100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.83it/s]
2023-08-23 09:01:51 dcm_anon     INFO     File 1.3.12.2.1107.5.2.50.176395.20230531.dcm with software version syngo MR XA50.

This is the same imaging sequence, just on a different scanner with a newer software version.

Strategy for Python version compatibility?

I was trying this project on Python 3.12 and discovered that it has removed bundled setuptools and with it pkg_resources. This breaks the recent version information change. What is the right way to fix this? I see these paths:

  • Add setuptools to list of required packages to keep supporting old versions of python
  • Switch to importlib.metadata to import version. This was probably "provisional" for 3.8 and 3.9 and became available in 3.10 link
  • Switch to importlib-metadata for backward compatibility with 3.8.

I volunteer to make these changes, but need guidance!

The strategy question: what is your recommendation for maintaining compatibility with older versions of Python?
It seems the new version of Pydicom will require at least 3.10 version of python.

Add conda package

Hi,

It would be great if you could also add a conda package in addition to the pip. It would make constructing complex dependencies (especially those relying on c++ libraries) much easier.

I am happy to work on it if you want. If you could publish the sdist of this package in pip, I can take care of the rest.

Cheers,
Sarthak

Investigate unit test failures with python 3.6, 3.7, 3.12

  • 3.6: Not available on ubuntu-22.04
  • 3.7: Results in failures in test_cli.py
  • 3.12: pkg_resources not supported
    • pydicom/data/data_manager.py:112: in get_external_sources
      from pkg_resources import iter_entry_points
      E   ModuleNotFoundError: No module named 'pkg_resources'
      
    • Probably due to loading incorrect version of pydicom. The latest version does not use pkg_resources, changed in Commit f5eeee

dataset.get() cannot read tags with 4 elements

In dicomfields, you have three tags with 4 indices:

E.g.
(0x5000, 0x0000, 0xFF00, 0x0000), # Curve Data

but pydicom.dataset.get() has the following interface:
get(key: Union[int, Tuple[int, int], pydicom.tag.BaseTag], default: Optional[object] = 'None')

so it doesn't recognize 4 ints as a tag.

TypeError: Dataset.get key must be a string or tag

Are those three special tags common? :)

Edit:
I see that they are deleted without calling get(). But in anonymize_data() you call .get() on it and then print when it fails. So that happens for every file. So perhaps that part should not be run for tags with len > 2?

Edit 2:
I refined my idea for a solution in a PR #18

Patient data within DirectoryRecordSequence (of DICOMDIR file) not anonymized

Dear KitwareMemdical,

I have a DICOMDIR that still contains the PatientName after running it through the default anonymizer!

Using the version from 1690b78 (installed via pip install git+https...) and pydicom version 2.3.1, I noticed that I have encountered a DICOMDIR file that contains a DirectoryRecordSequence with patient data and it looks like:

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 180
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: Media Storage Directory Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 2.25.297764926861898021974262533209051862847
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.2.276.0.45.1.1.0.71.20130122
(0002, 0013) Implementation Version Name         SH: 'DicomWeb_71'
-------------------------------------------------
(0004, 1130) File-set ID                         CS: 'VISAGECS_MEDIA'
(0004, 1200) Offset of the First Directory Recor UL: 406
(0004, 1202) Offset of the Last Directory Record UL: 406
(0004, 1212) File-set Consistency Flag           US: 0
(0004, 1220)  Directory Record Sequence  1 item(s) ---- 
   (0004, 1400) Offset of the Next Directory Record UL: 0
   (0004, 1410) Record In-use Flag                  US: 65535
   (0004, 1420) Offset of Referenced Lower-Level Di UL: 0
   (0004, 1430) Directory Record Type               CS: 'PATIENT'
   (0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
   (0010, 0010) Patient's Name                      PN: 'NOT^ANONYM'
   (0010, 0020) Patient ID                          LO: '12345678'
   (0010, 0030) Patient's Birth Date                DA: '19000101'
   ------—-

Note, I manually changed the patient data in order not to disclose anything here. The DirectoryRecordSequence had over 700 entries, so for debugging, I removed all but one entry to minimally demo the issue.

I ran dicomanonymizer.anonymize_dicom_file('DICOMDIR', 'DICOMDIR-anon') and expected the patient name etc. to be removed or replaced.

Instead, when I then pydicom.read_file('/DICOMDIR-anon') I got:

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 180
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: Media Storage Directory Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 2.25.172643117625232586517094341815358543841
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.2.276.0.45.1.1.0.71.20130122
(0002, 0013) Implementation Version Name         SH: 'DicomWeb_71'
-------------------------------------------------
(0004, 1130) File-set ID                         CS: 'VISAGECS_MEDIA'
(0004, 1200) Offset of the First Directory Recor UL: 406
(0004, 1202) Offset of the Last Directory Record UL: 406
(0004, 1212) File-set Consistency Flag           US: 0
(0004, 1220)  Directory Record Sequence  1 item(s) ---- 
   (0004, 1400) Offset of the Next Directory Record UL: 0
   (0004, 1410) Record In-use Flag                  US: 65535
   (0004, 1420) Offset of Referenced Lower-Level Di UL: 0
   (0004, 1430) Directory Record Type               CS: 'PATIENT'
   (0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
   (0010, 0010) Patient's Name                      PN: 'NOT^ANONYM'
   (0010, 0020) Patient ID                          LO: '12345678'
   (0010, 0030) Patient's Birth Date                DA: '19000101'
   ---------

The Instance UID changed, but not the Patient data within the sequence (of len 1 here). I am not allowed to provide the full original file but I think this smaller one reproduces the issue.

Please feel free to include this file in your test suite.
Let me know if I can be helpful here.

Any quick fix idea would be welcome.

thanks,
Samuel

X tags of VR == 'DA' not getting deleted

I found this while looking at pytest failures for "color3d_jpeg_baseline.dcm" file from pydicom's test files. The element (0020,0244) was not getting deleted. The tag is listed in X_TAGS.

The code for delete_element calls replace_element_date for VR=='DA', even if the element is supposed to be deleted. I cannot figure out why that is all right. Please help!

In addition to 'X', I see 'K' and 'C' listed in that row of PS 3.15: Table E-1.1. What do these mean?

Help: Passing values to function

I would like to use the following function:

def setupSeriesDescription(dataset, tag, value):
    r'''
    Modify the series description by adding a suffix
    '''
    element = dataset.get(tag)
    if element is not None:
        element.value = element.value + '-' + value

and then use them as follows:

def anonymize_dicom(src_path, dst_path):
    # List the files' names that we want to extract data from
    dicom_files = glob.glob(os.path.join(src_path, "**" ,'*.dcm'), recursive = True)

    # Iterate over each DICOM file in the folder and read it using dcmread()
    for file_path in dicom_files:
        # dictionary which map your functions to a tag
        extraAnonymizationRules = {}

        if True:
            # series description
            extraAnonymizationRules[(0x0008, 0x103E)] = setupSeriesDescription

        # Launch the anonymization and delete all private tags
        dcm = anonymize(file_path, dst_path, extraAnonymizationRules, deletePrivateTags=True)

How can I now use the variable value to be able to append something to the series description that differs from dataset to dataset?

push 1.0.12 to PyPI

It looks like 1.0.12 was released a few days ago, but it is still not on PyPI.

Regexp does not work on tag Patient's name

element.value = re.sub(options['find'], options['replace'], element.value)

fails when element is "Patient's name (0x0010,0x0010)"

The following works though:

element.value = re.sub(options['find'], options['replace'], str(element.value))

Change 1 DICOM tag

How can I set one tag to a specific value (no regexp) from the command line?

e.g.

dicom-anonymizer InputPath  OutputPath -t (0x0010,0x0010) DOE^JOHN

Anonymization of 0x0002 group

Currently, tags in the meta information header (0x0002 group) are not applied. This fixes that by applying the action to the file_meta dataset instead.
#18

executable - dicom-anonymizer

Hi,

I could not find an executable named dicom-anonymizer in the package. Neither it is generated after installation.
Capture

VR IS not yet implemented

I'm seeing this:

$ dicom-anonymizer E29171854 anons
  1%|▌                                                                                               | 75/13460 [00:10<32:50,  6.79it/s]
Traceback (most recent call last):
  File "/cm/shared/anaconda3/bin/dicom-anonymizer", line 33, in <module>
    sys.exit(load_entry_point('dicom-anonymizer==1.0.9', 'console_scripts', 'dicom-anonymizer')())
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/anonymizer.py", line 175, in m
ain
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/anonymizer.py", line 64, in an
onymize
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 305, in anonymize_dicom_file
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 413, in anonymize_dataset
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 236, in delete_or_empty_or_replace_UID
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 142, in empty_element
  File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 144, in empty_element
NotImplementedError: Not anonymized. VR IS not yet implemented.
  1%|▌

What can we do about this?

ignore non-dicom files instead of dying

If pointed at a directory which contains any non-dicom files, dicom-anonymizer dies with:

Traceback (most recent call last):
  File "/usr/local/bin/dicom-anonymizer", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
    anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
  File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
    anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
  File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 298, in anonymize_dicom_file
    dataset = pydicom.dcmread(in_file)
  File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 888, in dcmread
    force=force, specific_tags=specific_tags)
  File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 670, in read_partial
    preamble = read_preamble(fileobj, force)
  File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 623, in read_preamble
    raise InvalidDicomError("File is missing DICOM File Meta Information "
pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information header or the 'DICM' prefix is missing from the header. Us
e force=True to force reading.

I'd like it to either just throw a warning by default, or if that's considered too dangerous, at least have a flag that lets us do it.

Example code encourages incorrect anonymization

The following code in examples/anonymize_extra_rules.py encourages incorrect anonymization by not removing series description. That tag should be removed! We should create the example using a different tag:

    def setup_series_description(dataset, tag):
        element = dataset.get(tag)
        if element is not None:
            element.value = f'{element.value}-{args.suffix}'

UserWarning: Invalid value for VR UI

I'm seeing:

(useful) ddrucker@mic-dicom-router-mercure:~$ dicom-anonymizer E12034344 anon
  0%|                                                  | 0/3612 [00:00<?, ?it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '1.1.16.7.6707.3.3.60.06253.2332294204858050509242727.5.1.9'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '7.6.013.026981.5934635445.0220487836.2'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.1.59.6.4683.0.4.60.82574.0199929790689920395106101.5.8.8'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '3.9.59.4.0298.8.4.72.68889.658132895322857682744131'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
  0%|                                          | 2/3612 [00:00<03:52, 15.55it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '0.2.08.6.9364.4.6.39.26563.3829691537983094193002813.7.1.1'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
  0%|                                          | 5/3612 [00:00<02:50, 21.12it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.6.63.5.3208.2.0.67.68311.0344948525748214621517980'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
  0%|                                          | 8/3612 [00:00<02:32, 23.65it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.3.40.7.9556.8.5.07.50254.3682606057412565359545607'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '2.3.42.7.0184.5.4.49.35639.4066135185099479761379104'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '8.4.08.0.3756.8.9.48.29829.1828695589432128234084115'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)
  0%|▏                                        | 12/3612 [00:00<02:09, 27.87it/s]Traceback (most recent call last):
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 762, in get
    key = Tag(key)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 84, in Tag
    raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
    yield
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2390, in walk
    callback(self, data_element)  # self = this Dataset
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
    action(dataset, tag)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
    element = dataset.get(tag)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 764, in get
    raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ddrucker/useful/bin/dicom-anonymizer", line 8, in <module>
    sys.exit(main())
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
    anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
    anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 295, in anonymize_dicom_file
    anonymize_dataset(dataset, extra_anonymization_rules, delete_private_tags)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 393, in anonymize_dataset
    dataset.walk(range_callback)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2396, in walk
    dataset.walk(callback)
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 32, in tag_in_exception
    raise type(exc)(msg) from exc
TypeError: With tag (6000, 3000) got exception: Dataset.get key must be a string or tag
Traceback (most recent call last):
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 762, in get
    key = Tag(key)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 84, in Tag
    raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
    yield
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2390, in walk
    callback(self, data_element)  # self = this Dataset
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
    action(dataset, tag)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
    element = dataset.get(tag)
  File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 764, in get
    raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag

  0%|▏                                        | 12/3612 [00:00<02:29, 24.13it/s]
(useful) ddrucker@mic-dicom-router-mercure:~$

Proposal: Introduce ruff for code formatting and linting

Hi!

I have started looking into the codes and have sent one PR, and noticed the current code base might be under some code formatter/linter other than ruff or black which are common these days.

How about introduce one of them like pydicom and make it as this project's default?
Even today some contributors have changed single quote to double qoute to surround string, which isn't consistent.

Change only 1 DICOM tag

How can I change only 1 DICOM tag from the command line ?

If I do:
dicom-anonymizer InputPath OutputPath --dictionary pathToMyDict.json

then all the tags are anonymized in addition to the tags I list in my dictionary.

dropping fields I asked it to keep

dicom-anonymizer seems to be deleting some dicom fields, even if asked to keep them:

$ dcmdump input.dcm | grep 0029,0010
(0029,0010) LO [SIEMENS CSA HEADER]                     #  18, 1 PrivateCreator

$ dicom-anonymizer input.dcm output.dcm -t '(0x0029,0x1010)' keep
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 18.67it/s]

$ dcmdump output.dcm | grep 0029,0010

$

AttributeError: 'int' object has no attribute 'elements'

This is super odd.

When I anonymize a SRe file (SRe.1.3.12.2.1107.5.2.43.66094.30000020021213353802600000201), I get:

Traceback (most recent call last):
  File "/home/ddrucker/venvs/test/bin/dicom-anonymizer", line 10, in <module>
    sys.exit(main())
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 144, in main
    anonymize(InputPath, OutputPath, newAnonymizationActions)
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 41, in anonymize
    anonymizeDICOMFile(inputFilesList[cpt], outputFilesList[cpt], anonymizationActions)
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 241, in anonymizeDICOMFile
    anonymizeDataset(dataset, extraAnonymizationRules)
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 253, in anonymizeDataset
    action(dataset, tag)
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 126, in delete
    deleteElement(dataset, element)  # element.tag is not the same type as tag.
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 110, in deleteElement
    deleteElement(subDataset, subElement)
  File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 109, in deleteElement
    for subElement in subDataset.elements():
AttributeError: 'int' object has no attribute 'elements'

Now, here's the odd part.

Here's the code where it dies:

  def deleteElement(dataset, element):
      if element.VR == 'DA':
          replaceElementDate(element)
      elif element.VR == 'SQ':
          for subDataset in element.value:
              for subElement in subDataset.elements():         ### dying here
                  deleteElement(subDataset, subElement)
      else:
          del dataset[element.tag]

So I added a print statement:

  def deleteElement(dataset, element):
      if element.VR == 'DA':
          replaceElementDate(element)
      elif element.VR == 'SQ':
          for subDataset in element.value:
              print(element.value)             ### add this line
              for subElement in subDataset.elements(): 
                  deleteElement(subDataset, subElement)
      else:
          del dataset[element.tag]

And now it doesn't fail - it works fine.

My best guess is that printing element.value forces an enumeration which actually changes it - like maybe something without a value is forced to have an empty one instead?

PatientName and PatientID not getting properly replaced

I used the following JSON file:
{
"(0x0010, 0x0010)": {
"action": "regexp",
"find": ".",
"replace": "ID001^ID002"
},
"(0x0010, 0x0020)": {
"action": "regexp",
"find": ".
",
"replace": "ID003"
},
}
In the anonymized DICOM files, the PatientID tag gets set to ID003ID003 (i.e. it is duplicated).
The PatientName tag is similarly duplicated and set to ID001^ID002ID001^ID002

Support for latest DICOM standard

Part 15 E.1-1 table differs greatly between the 2013 standard (referenced in the README.md and listed in dicomfields.py) and the current standard:

  • 2013: 249 rows
  • current: 475 rows (+ differences for some of the same rows as above)

Notes:

  • Having the option to load different standards could be interesting for dicom conformance? Or reproducibility?
  • Those tables could be automatically generated by parsing the dicom standard (similar to what Innolitics' DICOM standard browser does)

anonymize_dataset fails if dataset contains RawDataElement

Hello,

Thank you for this great project.

While using it in our codebase, I have found the following issue.

anonymize_dataset fails if dataset contains RawDataElement

anonymize_dataset fails if the dataset contains RawDataElement:

Proposed solution

I would be happy to create a PR to fix this issue here too.

I see two possibilities:

  1. Sanitize dataset as part of anonymize_dataset:
    • iterate over the dataset and replace RawDataElements,
    • continue with the current code of anonymize_dataset.
  2. Modify current replace_element code to handle the RawDataElement case.

I'm partial to solution 1.:
- it's simple and easy to understand and review.
- it makes it easier for the user to add custom-rules since they are able to assume that all elements are RawDataElement, and they can use the simpler: element.value = new_value syntax.
- however, it also means that input dataset are walked-through twice. I feel that this price is worth paying.

Best
Guillaume

date modification functions

Hi, I'm just starting to investigate the use of this tool as we have been relying on https://mircwiki.rsna.org/index.php?title=MIRC_CTP_Articles for years now, but it has some shortcomings that have led me to explore alternatives. In particular, it is not being very actively developed anymore and it has been extended to support a huge number of use cases over the years. This makes it far more powerful than what we need for most of our data submitters, which also results in a lot more complexity in using it. It would be great if I could setup a meeting with someone from the Kitware team to understand your short and long term plans for maintaining this repository and discuss potential collaboration opportunities.

I love that you've approached this by mirroring the different de-id profiles and options defined in the DICOM standard. However, it doesn't appear that you are currently supporting the "Retain Longitudinal With Modified Dates Option" at the moment if you only support keeping or deleting dates. Let me know if I've missed something, but this is pretty critical to most de-identification use cases. Dates are PHI (so you can't keep them), but it generates useless DICOM if you delete them entirely since you lose all understanding the various timepoints for your patients.

CTP has a variety of approaches to this which you may want to emulate. The DateInterval and the IncrementDate functions are the ones we use most so I would advocate them as the best candidates to implement in dicom-anonymizer.

In any case, hope we can discuss further sometime soon.

Best,
Justin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.