imagingdatacommons / highdicom Goto Github PK

View Code? Open in Web Editor NEW

167.0 9.0 35.0 5.08 MB

High-level DICOM abstractions for the Python programming language

Home Page: https://highdicom.readthedocs.io

License: MIT License

Python 100.00%

dicom numpy pydicom python3 typehints

highdicom's Introduction

High DICOM

A library that provides high-level DICOM abstractions for the Python programming language to facilitate the creation and handling of DICOM objects for image-derived information, including image annotations, and image analysis results. It currently provides tools for creating and decoding the following DICOM information object definitions (IODs):

Annotations
Parametric Map images
Segmentation images
Structured Report documents
Secondary Capture images
Key Object Selection documents
Legacy Converted Enhanced CT/PET/MR images (e.g., for single frame to multi-frame conversion)
Softcopy Presentation State instances (including Grayscale, Color, and Pseudo-Color)

Documentation

Please refer to the online documentation at highdicom.readthedocs.io, which includes installation instructions, a user guide with examples, a developer guide, and complete documentation of the application programming interface of the highdicom package.

Citation

For more information about the motivation of the library and the design of highdicom's API, please see the following article:

Highdicom: A Python library for standardized encoding of image annotations and machine learning model outputs in pathology and radiology C.P. Bridge, C. Gorman, S. Pieper, S.W. Doyle, J.K. Lennerz, J. Kalpathy-Cramer, D.A. Clunie, A.Y. Fedorov, and M.D. Herrmann

If you use highdicom in your research, please cite the above article.

Support

The developers gratefully acknowledge their support:

highdicom's People

Contributors

Stargazers

Watchers

highdicom's Issues

Documentation of None type for optional parameters

Currently, the None type is omitted from type annotations in the docstrings.

As discussed in #75 (comment), we should consistently use Union[..., None], optional to clearly indicate that None is a valid option for these parameters.

from typing import Optional


def foo(bar: int, optional_arg: Optional[str] = None) -> None:
    """

    Parameters
    -----------
    bar: int
        A required parameter
    optional_arg: Union[str, None], optional
        An optional parameter
    """

Incorrectly named IOD_MODULE_MAP key for positron emission tomography

https://github.com/MGHComputationalPathology/highdicom/blob/a9f3916a9b98e09ff5f359fe77f508746e56b730/src/highdicom/_iods.py#L7825

Which SOPInstanceUID is the key pet-image supposed to link to? I don't see a pet image storage on the official list.

Is it referring to Positron Emission Tomography Image Storage SOP 1.2.840.10008.5.1.4.1.1.128?

If so the key should be changed to positron-emission-tomography-image

If this isn't the issue what key in IOD_MODULE_MAP does SOP 1.2.840.10008.5.1.4.1.1.128 map to?

Sorting of image planes for legacy converted objects

As implemented, it appears that legacy converter does not sort frames geometrically. Arguably, it should be the responsibility of the reader to sort them, but I wanted to confirm this was a deliberate decision not to sort them. @hackermd can you comment?

As aside, Slicer does not sort frames while reading resulting multiframe (tested with the 2020-01-12 nightly):

Malformed values of Content Label attribute

The Content Label attribute has Value Representation CS, which is restricted to [A-Z_0-9] characters. This is currently implemented incorrectly in highdicom. In addition, pydicom doesn't complain if the value is not valid. Therefore, we will need to perform additional checks to ensure the values are valid.

Fix AttributeError exception

I think this line should be catching AtrributeError, not KeyError exception: https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/legacy/sop.py#L361

SC Images are not included in the list of _iods.py

This issue is because the generic Secondary Captures UID is no longer in the DICOM spec. However, they are still wildly used and thus should be supported.

Creating segmentation is slow when using a large number of segments

Spoke with @hackermd about this earlier. I am trying to save a feature vector for each frame of a SM image as a segmentation which ends up creating 1024 segments. I pinned the slowness down to these lines.

Provide always the available metadata

I recently came across this SO question and as a solution, I answered suggesting the usage of this package. However, I noticed that since the attributes ICCProfile and OpticalPathSequence are not present, it's not possible to inspect the metadata of the file provided by the OP at all. Furthermore, since the code works correctly with the correct_color option set to False in read_frame, why not print a warning when reading metadata instead of raising the AttributeError and raise it subsequently in the read_frame function if correct_color is True? I think that it's better to have the possibility to inspect, view and print all the available metadata rather than stop the code execution if one is missing. As a disclaimer, I'm not a DICOM expert so maybe there's a reason to do the things in this way that I can't actually understand.

Update import of storage class UIDs

In the next pydicom release, storage class UIDs will become importable from pydicom.uid (see https://github.com/pydicom/pydicom/blob/master/pydicom/_storage_sopclass_uids.py) and the private _storage_class_uid.py module will be deprecated. We should consider updating highdicom once pydicom 2.3 is out.

Not able to create ContentSequence

Hi,
Im using highdicom 0.9.0 to create a SpecimenDescription. When I try to create SpecimenCollection, SpecimenStaining, or SpecimenSampling ContentSequences for a SpecimenPreparationStep:

from highdicom.content import SpecimenCollection, SpecimenStaining, SpecimenSampling
from highdicom.sr.coding import CodedConcept
import highdicom
SpecimenCollection(CodedConcept('P1-03000', 'SRT', 'Excision'))
SpecimenStaining(
        [
            CodedConcept('C-22968', 'SRT', 'hematoxylin stain'),
            CodedConcept('C-22919', 'SRT', 'water soluble eosin stain')
        ]
    )
SpecimenSampling(
        method = CodedConcept('111727', 'SRT', 'Dissection with representative sections submission'),
        parent_specimen_id = 'specimen id',
        parent_specimen_type = CodedConcept('G-8300', 'SRT', 'Tissue specimen')
    )

I get the following error (similar for SpecimenStaining and SpecimenCollection):

AttributeError: Items to be appended to a SpecimenSampling must have an established relationship type.

The same error appears in 0.8.0 but not in 0.7.0.

Fix typo preventing output of PixelMeasuresSequence

This line should say Pixel, not Mixel: https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/legacy/sop.py#L363

Without correcting it, the resulting object does not have PixelMeasureSequence initialized, and is not valid. How were you able to load the output legacy enhanced objects into OsiriX? Or perhaps OsiriX does not care if PixelSpacing is absent?

@afshinmessiah FYI

@hackermd would it be possible to set my permissions so that I can fork the repository and submit pull requests? I would prefer to contribute directly. Right now I have fork button disabled, which makes it impossible to submit pull requests:

Wrong Segmentation Slices Position

I'm writing a script to attach AI made lesion segmentations to existing DICOM files (It's my first time that I dive so deep into DICOM, so I might have missed something).

I have found your repository and used hd.seg.Segmentation, but unfortunately the order of the segmentation slices was incorrect.

First I tried to use just a dummy segmentation (simple 3D box over an MRI), and after encountering the issue I decided to try another approach:
I took an existing DICOM file (of a CT scan) that contains both a scan and a segmentation, converted the segmentation to np.array, and used is as the mask input, while the scan folder is the source_images.

From the original segmentation file (9 slices between -224.5mm to -192.5mm, 4mm thickness):
(0020,0032) DS [-189.136\-320.136\-224.5] # 24, 3 ImagePositionPatient
(0020,0032) DS [-189.136\-320.136\-220.5] # 24, 3 ImagePositionPatient
:
(0020,0032) DS [-189.136\-320.136\-192.5] # 24, 3 ImagePositionPatient

Versions:
Python 3.8.10
numpy 1.21.2
pydicom 2.2.1
highdicom 0.10.0

Code snippets:

:

# Collection the scan images
image_datasets = [dcmread(str(f)) for f in image_files]

# Creating a segmentation mask from the existing segmentation. shape = (108, 512, 512)
mask = np.zeros(
    shape=(
        len(image_datasets),
        image_datasets[0].Rows,
        image_datasets[0].Columns
    ),
    dtype=bool
)
mask[86:95, :, :] = np.array(hd.seg.segread(
    '/home/ben/.../original_seg.dcm'
    ).pixel_array, dtype=bool)

# Printing to check if the order is correct
sl_num = -568.5
for slice in mask:
    print(sl_num, np.unique(slice))
    sl_num += 4

Output:

-568.5 [False]
-564.5 [False]
-560.5 [False]
:
-236.5 [False]
-232.5 [False]
-228.5 [False]
-224.5 [False  True]
-220.5 [False  True]
-216.5 [False  True]
-212.5 [False  True]
-208.5 [False  True]
-204.5 [False  True]
-200.5 [False  True]
-196.5 [False  True]
-192.5 [False  True]
-188.5 [False]
-184.5 [False]
-180.5 [False]
:
-148.5 [False]
-144.5 [False]
-140.5 [False]

Creating the segmentation:

# Get meta-data information from the existing series
series_instance_uid = image_datasets[0].SeriesInstanceUID
series_number = image_datasets[0].SeriesNumber
sop_instance_uid = image_datasets[0].SOPInstanceUID
instance_number = image_datasets[0].InstanceNumber
MANUFACTURER = 'Ben' # image_datasets[0].Manufacturer
MANUFACTURER_MODEL_NAME = "Prost" # image_datasets[0].ManufacturersModelName
SOFTWARE_VERSIONS = 'v1.0'
DEVICE_SERIAL_NUMBER = '0' # image_datasets[0].DeviceSerialNumber

# Describe the algorithm that created the segmentation family:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7162.html
algorithm_identification = hd.AlgorithmIdentificationSequence(
    name=MANUFACTURER_MODEL_NAME,
    version=SOFTWARE_VERSIONS,
    family=codes.cid7162.ArtificialIntelligence
)

# Describe the segment:
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.SegmentDescription
# segmented_property_category:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7150.html
# segmented_property_type:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7160.html
description_segment = hd.seg.SegmentDescription(
    segment_number=1,
    segment_label='Lesions',
    segmented_property_category=codes.cid7150.AnatomicalStructure,
    segmented_property_type=codes.cid7160.Prostate,
    algorithm_type=hd.seg.SegmentAlgorithmTypeValues.AUTOMATIC,
    algorithm_identification=algorithm_identification,
    tracking_uid=hd.UID(),
    tracking_id='Lesion Segmentation of a Prostate MR Image',
    # anatomic_regions=Code("41216001", "SCT", "Prostate"), # BA - error, seems like the others are enough
)

# Create the Segmentation instance
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.Segmentation
seg_dataset = hd.seg.Segmentation(
    source_images=image_datasets,
    pixel_array=mask,
    segmentation_type=hd.seg.SegmentationTypeValues.BINARY, # FRACTIONAL,
    segment_descriptions=[description_segment],
    series_instance_uid=series_instance_uid, # hd.UID(),
    series_number=series_number,
    sop_instance_uid=sop_instance_uid, # hd.UID(),
    instance_number=instance_number,
    manufacturer=MANUFACTURER,
    manufacturer_model_name=MANUFACTURER_MODEL_NAME,
    software_versions=SOFTWARE_VERSIONS,
    device_serial_number=DEVICE_SERIAL_NUMBER,
    omit_empty_frames=True,
    # content_creator_name=manufacturer,
)


# Compare generated and original segmentations
print('seg:')
for i, slice in enumerate(seg_dataset.PerFrameFunctionalGroupsSequence):
    print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(seg_dataset.pixel_array[i]))

ref_path = '/home/ben/.../original_seg.dcm'
ref_seg = hd.seg.segread(ref_path)
print('\noriginal:')
for i, slice in enumerate(ref_seg.PerFrameFunctionalGroupsSequence):
    print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(ref_seg.pixel_array[i]))

And the outputs:

seg:
-564.5 [0 1]
-552.5 [0 1]
-520.5 [0 1]
-488.5 [0 1]
-404.5 [0 1]
-328.5 [0 1]
-324.5 [0 1]
-244.5 [0 1]
-200.5 [0 1]

original:
-224.5 [0 1]
-220.5 [0 1]
-216.5 [0 1]
-212.5 [0 1]
-208.5 [0 1]
-204.5 [0 1]
-200.5 [0 1]
-196.5 [0 1]
-192.5 [0 1]

As seen, the newly generated segmentation got wrong PlanePositionSequence. The same happens when I use omit_empty_frames=False.

I'm still reading through the library source code, found another bug and told @hackermd about it but it was not related.

Had a thought about adding a reorder by original position function, to make sure the segmentation is being mapped correctly in case the original scan files are not in order.

Any other thoughts or comments?

Many thanks to all of you anyway, it's a really impressive library and I'm glad you've released it exactly when I started working on this project :)

It is possible to create qualitative evaluations with no relationship types

I created an SR which contained qualitative evaluations, which are passed in as a sequence of CodeContentItems to the qualitative_evaluations parameter of the various SRs. I constructed these CodeContentItems without the relationship_type parameter, and highdicom did not issue any warnings or complaints. However the dciodvfy tool raises an error saying that the RelationshipType is required and therefore the SR objects are invalid.

I am not sure exactly what the right level to solve this issue is.

The simplest way is simply to check the elements of qualitative_evaluations when constructing MeasurementsAndQualitativeEvaluations objects to ensure they have RelationshipTypes. However after thinking about this more I think that this specific issue may be a symptom of a broader problem. Currently the relationship_type is an optional parameter for a ContentItem and all its subclasses (e.g. CodeContentItem, TimeContentItem,NumContentItem, ...). I'm not sure that this is correct. I think it should probably be required.

My reading of table C.17-6 is that RelationshipType is a type 1 (required) attribute within a ContentSequence. I tried making relationship_type a required parameter of ContentItem and all subclasses, and the only tests that failed were tests where these objects were directly constructed without the relationship type parameter by test code. I.e. there is no place in the main codebase where an item is constructed without a relationship type (or at least not one with test coverage I suppose...). I haven't been able to think of a situation where a relationship type wouldn't be needed. The one place I'm unsure about is the very root of the content tree, which doesn't have a relationship to its parent. I'm not sure how to understand the recursive standard docs on this point. But even there it seems like highdicom is currently placing a RelationshipType attribute at the root of the tree anyway

Another option is that maybe the append method of ContentSequence should have a check to ensure that the ContentItem being appended has a RelationshipType.

@hackermd can you help please me understand at what level this issue exists?

Support of image-level qualitative evaluation in TID 1501

TID 1501 allows to capture qualitative evaluations assigned per image (without specifying any annotation) - see rows 10b/11b in https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_A.html#sect_TID_1501.

This does not appear to be possible using highdicom - I don't see how I could specify image in the constructor of TID1501: https://github.com/herrmannlab/highdicom/blob/master/src/highdicom/sr/templates.py#L2420. Am I correct, or I am missing something?

Prevent inclusion of empty Content Sequence attribute in SR content items

Upon construction of some content items, we first add an empty ContentSequence to the object and then later append individual items to it. If, however, for some reason no items get subsequently added, the ContentSequence attribute remains empty, which is not allowed by the standard.

Checks that values satisfy requirements of the VR

This has been discussed before (e.g. here) but we should have an issue to track it.

Pydicom is very loose in what it allows you to set as an attribute's value, even when you have the global configuration option pydicom.config.enforce_valid_values set to True. We have previously encountered and resolved this narrowly for decimal strings (DS) #57 #65, but the issue is broader. Checks for the other VRs are largely absent from pydicom, but many VRs have limits on length of the string, list of allowable characters, capitalisation, etc (see standard). The result is many one-off checks being including to check user-supplied values in highdicom, as well as probably many missed checks that could allow files with invalid values to be produced. We should tackle this in a more unified way to reduce redundancy and probability of invalid values slipping through the net.

My feeling is that as far as possible we should add this functionality to pydicom and then integrate into highdicom.

JPEG2000Lossless Compression not working for Segmentations

I'm getting

OSError: encoder error -2 when writing image file

when trying to create fractional segmentation images with transfer_syntax_uid=JPEG2000Lossless

Mapping of private into standard attributes as part of legacy enhanced MF generation

Would this task be considered "in scope" for highdicom?

Should inclusion of referenced images in PertinentOtherEvidenceSequence be enforced?

It is my understanding that the PertinentOtherEvidenceSequence should contain all instances referenced anywhere in the content tree, and failing to do so would technically be non-compliant. See sect_C.17.2.3

[evidence in PertinentOtherEvidenceSequence] shall include, but is not limited to, all current evidence referenced in the content tree

When constructing an SR object, highdicom has all the available information to ensure that this has been done correctly (i.e. no referenced instance in the content tree has been omitted from the evidence parameter), but it currently does not do so.

I would suggest that one improvement could be to add code to the SR constructor that walks the content tree and raises an exception if any evidence is missing. Unfortunately I think it will not be possible to automatically add the evidence as some information (such as series instance uid) may be missing from the references in the content tree.

Thoughts? @hackermd

Supporting unknown SOPClassUID

We are investigating the use of highdicom for converting our NIfTI segmentation outputs to DICOM SEG format. The corresponding DICOM series have some of the fields anonymised. For instance SOPClassUID.

It seems highdicom is expecting a predefined set of UIDs. Is there possibility of supporting also unknown fields as well?

Here is the error message I am getting;

traceback (most recent call last):
  File "convert.py", line 75, in <module>
    seg_dataset = Segmentation(
  File "/../lib/python3.8/site-packages/highdicom/seg/sop.py", line 422, in __init__
    self.copy_specimen_information(src_img)
  File "/../lib/python3.8/site-packages/highdicom/base.py", line 285, in copy_specimen_information
    self._copy_root_attributes_of_module(dataset, 'Image', 'Specimen')
  File "/../lib/python3.8/site-packages/highdicom/base.py", line 240, in _copy_root_attributes_of_module
    iod_key = SOP_CLASS_UID_IOD_KEY_MAP[dataset.SOPClassUID]
KeyError: '05703841015643452503209109170649121759226008273168'

Is it possible to create a dicomseg for a DX?

I am trying to save the segmentation of a CXR as a dicomseg.

seg_dataset = Segmentation(
            source_images=[source_image],
            pixel_array=np.uint16(seg_img),
            segmentation_type=SegmentationTypeValues.FRACTIONAL,
            segment_descriptions=description_segments,
            series_instance_uid=generate_uid(),
            sop_instance_uid=generate_uid(),
            instance_number=instance_number,
            manufacturer="deepc",
            manufacturer_model_name=config["NAME"],
            software_versions="v"+config["VERSION"],
            device_serial_number="Device XYZ",
            series_number=2,
            fractional_type="OCCUPANCY",
        )

Even though I am using a proper DX dcm. I am getting errors where it says some fields like StudyID, SliceThickness and PixelSpacing are missing in the source_image. Is it possible to skip the missing fields? or is there a better way to handle such scenarios?

Attributes 'DICOMPrefix' 'FilePreamble' when initiating SOPClass

Hi,

Highdicom sets the attribute 'DICOMPrefix' and 'FilePreamble' which is not in the pydicom dicom directory. This results in a warning by default or an exception if pydicom config.INVALID_KEYWORD_BEHAVIOR = "RAISE". I dont think you need to set those attributes as they are written by pydicoms dcmwrite?

Simple TimePointContext constructor fails

Trying to use the constructor for the TimePointContext object seems to result in an exception regardless of the input parameters.

from highdicom.sr.templates import TimePointContext

TimePointContext('whatever')

Results in:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    TimePointContext('whatever')
  File "/Users/christopher.bridge/Developer/highdicom/src/highdicom/sr/templates.py", line 214, in __init__
    self.append(time_point_item)
  File "/Users/christopher.bridge/Developer/highdicom/src/highdicom/sr/value_types.py", line 139, in append
    super(ContentSequence, self).append(item)
  File "/Users/christopher.bridge/.pyenv/versions/highdicom/lib/python3.8/site-packages/pydicom/sequence.py", line 62, in append
    super().append(val)
  File "/Users/christopher.bridge/.pyenv/versions/highdicom/lib/python3.8/site-packages/pydicom/multival.py", line 66, in append
    self._list.append(self.type_constructor(val))
AttributeError: 'TimePointContext' object has no attribute '_list'

I will keep thinking about it but for now this has me stumped. Any ideas appreciated.

I'm using pydicom version 2.1.2, python 3.8.7, highdicom master branch

Change `ParametricMap` to use new `VOILUTTransformation` class

Unify the API of the ParametricMap class with the presentation state SOPClasses by having the user provide a VOILUTTransformation object to the constructor. This is a backward incompatible change

The new Composite Instance shall contain the Contributing Equipment Sequence (0018,A001).

I believe some part of the standard have not yet been implemented in the classic to enhanced conversion step:

http://dicom.nema.org/medical/dicom/current/output/chtml/part04/sect_C.3.5.html

The new Composite Instance shall contain the Contributing Equipment Sequence (0018,A001). If the source Composite Instances already contain the Contributing Equipment Sequence with a consistent set of Item values (excluding Contribution DateTime (0018,A002)), then a new Item shall be appended to the copy of the sequence in the new Composite Instance; if the source Composite Instance does not contain the Contributing Equipment Sequence or the Item values (excluding Contribution DateTime (0018,A002)) differ between source instances, then Contributing Equipment Sequence shall be created, containing one new Item. In either case, the new Item shall describe the equipment that is creating the new Composite Instance, and the Purpose of Reference Code Sequence (0040,A170) within the Item shall be (109106, DCM, "Enhanced Multi-frame Conversion Equipment") and the Contribution Description (0018,A003) shall be "Legacy Enhanced Image created from Classic Images", "Classic Image created from Enhanced Image", or "Updated UID references during Legacy Enhanced Classic conversion" as appropriate.

Allow for Real World Value Mapping as a Shared Functional Group for Parametric Map IOD

I think if the user passes in either a single RWVM instance in the parametric map constructor, or passes a sequence with a single RWVM, we should add it to the Shared Functional Groups Sequence rather than forcing them into the Per-frame sequence.

@hackermd I can take care of this if you would like, let me know.

Errors in input data and legacy conversion

While evaluating legacy sop classes, @afshinmessiah and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.

This raised the discussion with @dclunie below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:

this will be easier to understand and use for users trying to use legacy conversion functionality
it may be easier to correct only errors that are important for legacy conversion specifically, rather than develop a tool that tries to fix all errors (e.g., not all of the attributes will be used for legacy conversion)
if we first patch the input datasets, and then do the conversion, we would need to reference the ephemeral patched datasets, if we want to do things right

@hackermd did you think about this?

From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper

Hi Andrey

I also copied Steve.

In short, probably option 2 (patch the source dataset, and then
do the conversion).

It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.

E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.

Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.

I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.

But if one can make a fix, it would be nice to.

There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:

ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf

http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2

In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).

I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".

In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.

In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.

This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.

This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.

When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.

It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.

BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.

David

PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.

PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.

On 2/19/20 11:18 PM, Andrey Fedorov wrote:

Hi David,

As Afshin is working on the MF conversion task, we wanted to ask you a

"fundamental" question.

As you know, TCIA datasets may often have errors. What should be the
strategy for addressing those? Should we:

carry forward those errors into MF representation, and just ignore
those while validating MF?

patch the source dataset, and then do the conversion?

patch the errors in the MF representation in the process of
conversion, and keep the originals intact?

I would probably prefer option 3.

AF

`utils.compute_plane_position_tiled_full()` raises `ValueError: got an unexpected keyword argument 'coordinate'`

It looks like the above method calls spatial.map_pixel_into_coordinate_system() with its outdated function signature.

https://github.com/MGHComputationalPathology/highdicom/blob/50467bb6b75a001e43d6963bab5d16c8ac6f0a01/src/highdicom/utils.py#L136-L141

It appears that there are a number of failed tests with the same issue.

DICOM SEG for CMR cine data (2D + t)

Trying to generate a dicom seg file based on a multiple unsigned np.array, which is of shape (#_frames, #rows, #cols). The way I've approached the task was by generating dicom seg with the same series uid across all frames of the segmentation array. However, when displaying the output the 3 different delineated structures do not appear on a single frame (spatially they do not overlap).

Bellow is an example of the code that I've used to generate the dicom outputs.

Is it possible to show all the 3 structures in a single frame?

# img_dsets (list): list of pydicom.Datasets with lenght equal to the number of short-axis cine data
# seg_vol (np.array): Segmentation volume array with shape (#_frames, #rows, #cols)
import pydicom as pyd
import highdicom as hd

manufacturer_name = "dummy_manufacturer_name"
manufacturer_model_name = "dummy_model_name"
software_version = "0.1"


seg_labels = {
    "myocardium": {"class_number": 1, "sct_code": pyd.sr.codedict.codes.SCT.LeftVentricleMyocardium},
    "left_ventricle": {"class_number": 3, "sct_code": pyd.sr.codedict.codes.SCT.LeftVentricle},
    "right_ventricle": {"class_number": 2, "sct_code": pyd.sr.codedict.codes.SCT.RightVentricle},
}


algo_details = hd.AlgorithmIdentificationSequence(
    name=manufacturer_model_name,
    version=software_version,
    family=pyd.sr.codedict.codes.cid7162.ArtificialIntelligence,
    source=manufacturer_name,
)

segment_descriptions = []
for label_name, label_info in seg_labels.items():

    seg_details = hd.seg.SegmentDescription(
        segment_number=label_info["class_number"],
        segment_label=label_name,
        segmented_property_category=pyd.sr.codedict.codes.cid7150.AnatomicalStructure,
        segmented_property_type=label_info["sct_code"],
        algorithm_type=hd.seg.SegmentAlgorithmTypeValues.AUTOMATIC,
        algorithm_identification=algo_details,
        tracking_uid=hd.UID(),
        tracking_id=f"Cardiac structure #{class_number}",
    )

    segment_descriptions.append(seg_details)


series_uid = hd.UID()

for frame_idx in range(seg_vol.shape[0]):

    seg_dset = hd.seg.Segmentation(
        source_images=[img_dsets[frame_idx]],
        pixel_array=seg_vol[frame_idx],
        segmentation_type=hd.seg.enum.SegmentationTypeValues.BINARY,
        segment_descriptions=segment_descriptions,
        series_description="Segmentation-Test",
        series_number=5,
        series_instance_uid=series_uid,
        sop_instance_uid=hd.UID(),
        instance_number=int(frame_idx) + 1,
        manufacturer=manufacturer,
        manufacturer_model_name=manufacturer_model_name,
        software_versions=software_version,
        device_serial_number=str(img_dsets[frame_idx].DeviceSerialNumber),
    )

    seg_dsets.append(seg_dset)

Also, I've tried to generate a single dataset file for all frames at once, and with that approach the output data only shows a total number of frames equal to the # of classes, not iterating over all the frames.

Content item for Person Observer Name in template Person Observer Identifying Attributes has wrong value type

I just realized that Person Observer Name is currently incorrectly encoded using TextContentItem instead of PnameContentItem (see highdicom.sr.templates.PersonObserverIdentifyingAttributes).

SR templates do not support Image Library

Highdicom doesn't have support for TID 1604 (Image Library). This would be useful for encoding imaging attributes (imaging orientation, slice thickness, pixel spacing) for clients interpreting Comprehensive 2D SRs before the source image SOPInstances were retrieved.

High DICOM and Supplement 222 Microscopy Bulk Simple Annotations Storage SOP Class

Dear Sir,

As we know, the High DICOM support

Creating Segmentation (SEG) images
Parsing Segmentation (SEG) images
Creating Structured Report (SR) documents
Parsing Structured Report (SR) documents

But we are not sure whether High DICOM can support Supplement 222 Microscopy Bulk Simple Annotations Storage SOP Class.

This Supplement to the DICOM Standard specifies a new DICOM Information Object and Storage SOP Class for storing Microscopy Bulk Simple Annotations (points, open polylines, closed polygons and simple geometric shapes without relationships), which is referred to as the Microscopy Bulk Simple Annotations IOD.

Microscopy Bulk Simple Annotations are usually created by machine algorithms from high resolution images of entire tissue sections, e.g., encoded as DICOM Whole Slide Microscopy images.

If High DICOM can support, it will be easily used after feeding data into AI.

Segmentation objects lack attributes for image orientation and position

Segmentation objects created by the library currently lack the following attributes:

Image Orientation Slide
Total Pixel Matrix Origin Sequence

which are needed to uniquely localize a segmentation image of a slide microscopy image within the slide-based coordinate system and are required by the standard (see Segmentation Image Module).

Sorting of attributes into Shared/PerFrame FGs

@hackermd one issue we identified with the current legacy converter is that it does not detect attributes that are repeated and identical in the PerFrame functional groups to factor them out into Shared functional groups (which I think is non-compliant with the standard, and is a bug).

We want to fix this, and @afshinmessiah was going to work on this. But before he starts working on it, we wanted to confirm this contribution would be welcomed. If it is, then we would also appreciate if you tell us if you have specific approach in mind how and where you would want it to be implemented, or if you would want us to come up with a proposal.

It's impossible to add multiple fractional segments

When a fractional segment is added with a float type (i.e. truly fractional segments, not binary segments encoded as fractional), the segment number is always set to 1. This means that if a second segment is added, both will have segment number 1, which I assume is not allowed...I have added a check to prevent this in my feature/more_seg_tests branch, but this makes adding multiple fractional segments will be impossible.

I suggest adding an option to allow the user to explicitly specifiy the segment number to get around this issue.

As a side effect, this would also allow adding binary masks as different segments, where currently it would be necessary to make sure that the true pixels' values match the segment number.

Thoughts @hackermd ?

I'm happy to have a shot at making alterations, but wanted your thoughts first

"overlay-plane" module is not listed in "_module.py" file

In the list of iods ( _iod.py), "overlay-plane" is listed as a module for pet-image. In _module.py file though, there is no such module as "overlay-plane" to get the list of its attributes.

Highdicom allows incorrectly formatted "person name" (PN) attributes

There are a few places where the highdicom API requests str parameters and directly encodes them as attributes with value representation PN (person name). This includes but (but may not be limited to) the content_creator_name parameter of the segmentation SOP constructor, and the verifying_observer_name parameter of the EnhancedSR, ComprehensiveSR, and Comprehensive3DSR constructors.

The format of a PN attribute is quite specific - you can't just enter free text here. See the PN entry in this table. Briefly, for human names there are five fields (family name, given name, middle name, name prefix, name suffix) that should be separated by caret characters (^). See also the examples in the standard.

Unfortunately, no attempt is made at the pydicom level to enforce or check correct formatting. This propagates to highdicom. Therefore there is no checking or enforcement on these in highdicom, nor any documentation that there is even a format that should be followed. I suspect that the result is that the vast majority of users will pass "John Doe" instead of "Doe^John" and end up with incorrectly formatted attributes.

I consider these formatting details to be far lower level than users of highdicom should have to understand in order to create objects with correctly formatted PN attributes.

I am happy to work on a solution. Here are a few options that come to mind:

(My preferred solution): Create a PersonName object (either in the highdicom.content module or a new highdicom.vrmodule perhaps) with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, and has a method that returns the correctly formatted string. Then change the API of the various parts of the code expecting person names as string to instead expect PersonName objects.
Continue to expect strings but check that the format is correct (difficult because in theory each component can contain a space and components can be missing)
Try to fix this at the pydicom level.

@hackermd thoughts?

Segmentation not working for 2D X-ray mammography DICOMs

Generation of Segmentation (SEG) images doesn't work for mammography images due to required tags that are optional and usually not present. In particular: FrameOfReferenceUID, SliceThicknes, ImageOrientationPatient and ImagePositionPatient. I'm attaching an exemplary mammography DICOM file.

Thanks for considering it!

test_dicom.zip

Pydicom 2.3.0 breaks highdicom

Latest release of pydicom is causing several tests to fail. At first glance all errors appear to be related to a new behaviour for handling multivalues. It may be difficult to support versions both sides of the change. We may want to disallow 2.3.0 as an interim measure

Integrate highdicom w/ DICOM web viewer

Dear Sir,

The highdicom provides powerful DICOM SR and annotations API. But it seems to require integration with DICOM viewer.

As far as we know, Slim viewer doesn't include highdicom python package, correct? But it can generate corresponding annotations and SRs. Would you mind if I ask why Slim is not developed in python and then uses the highdicom package?

If we use the highdicom kit to develop the digital pathology web viewer and generate annotations and SRs at the same time, could you give some suggestions?

Thanks.

Should PixelOriginInterpetation be required?

PixelOriginInterpetation is only required for Whole slide microscopy, see https://dicom.innolitics.com/ciods/extensible-sr/sr-document-content/00480301, but highdicom essentially makes it Type 1.

The problem with this is when defining region of interest for non-WSI images. It is not clear what is the meaning of VOLUME PixelInterpretationOrigin for CT, for example, since Total Pixel Matrix Origin seems to be WSI-specific. At the same time, it is impossible to define PixelInterpretationOrigin to be FRAME, since there is no frame number for non-enhanced images.

Most logical to me would be to allow passing None as pixel_origin_interpretation parameter value to the ImageRegion constructor, but instead of assigning VOLUME when None is passed (see https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/sr/content.py#L481) not including this attribute in the dataset.

At the moment, I can only initialize that attribute to VOLUME.

Syntax highlighting on readthedocs.io not working as expected

There is no syntax highlighting for the code examples on read the docs https://highdicom.readthedocs.io/en/latest/usage.html

However, when html files are built locally with the makefile, syntax highlighting is working fine

Iter segments does not function properly when there is a single frame in a segmentation image

This is because pydicom.Dataset.pixel_array returns a 2D rather than 3D array in this case. As a result, iter_segments ends up iterating through the rows of the single frame

Add better support for segmented LUTs

#139 added support for segmented LUTs in presentation states, however the user is left to themselves to construct valid lut data, which is not straightforward.

We should provide utility methods or alternative constructor to allow users to construct LUTs in a more intuitive way

Unable to import highdicom in Mac

Hi,

I'm unable to import highdicom after the installation. Here are some logs.


python3 -m pip install highdicom --ignore-installed
Defaulting to user installation because normal site-packages is not writeable
Collecting highdicom
  Using cached highdicom-0.14.0-py3-none-any.whl (697 kB)
Collecting pylibjpeg-libjpeg>=1.2
  Using cached pylibjpeg_libjpeg-1.3.0-cp38-cp38-macosx_11_0_arm64.whl (1.6 MB)
Collecting pydicom>=2.2.2
  Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
Collecting pylibjpeg>=1.3
  Using cached pylibjpeg-1.4.0-py3-none-any.whl (28 kB)
Collecting pillow-jpls>=1.0
  Using cached pillow_jpls-1.1.0-cp38-cp38-macosx_11_0_arm64.whl (71 kB)
Collecting pylibjpeg-openjpeg>=1.1
  Using cached pylibjpeg_openjpeg-1.2.0-cp38-cp38-macosx_11_0_arm64.whl (581 kB)
Collecting numpy>=1.19
  Using cached numpy-1.22.1-cp38-cp38-macosx_11_0_arm64.whl (12.7 MB)
Collecting pillow>=8.3
  Using cached Pillow-9.0.0-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB)
Installing collected packages: pillow, numpy, pylibjpeg-openjpeg, pylibjpeg-libjpeg, pylibjpeg, pydicom, pillow-jpls, highdicom
Successfully installed highdicom-0.14.0 numpy-1.22.1 pillow-9.0.0 pillow-jpls-1.1.0 pydicom-2.2.2 pylibjpeg-1.4.0 pylibjpeg-libjpeg-1.3.0 pylibjpeg-openjpeg-1.2.0

python3
Python 3.8.9 (default, Oct 26 2021, 07:25:53) 
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import highdicom as hd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/__init__.py", line 4, in <module>
    from highdicom import legacy
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/legacy/__init__.py", line 4, in <module>
    from highdicom.legacy.sop import (
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/legacy/sop.py", line 18, in <module>
    from highdicom.frame import encode_frame
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/frame.py", line 6, in <module>
    import pillow_jpls  # noqa
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/__init__.py", line 2, in <module>
    from .jpls_image_file import JplsImageFile, accept
  File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/jpls_image_file.py", line 3, in <module>
    from . import _pycharls
ImportError: dlopen(/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so, 2): Symbol not found: __ZN3fmt2v87vformatENS0_17basic_string_viewIcEENS0_17basic_format_argsINS0_20basic_format_contextINS0_8appenderEcEEEE
  Referenced from: /Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so
  Expected in: flat namespace
 in /Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so
>>>

Note: I tried completely uninstalling and installing the package but the problem persists.

How do I enter free text in a TID 1501?

In row 12 of TID 1501 it appears that text entries can be added to the object; the Content Item Description for this line states Allows encoding a flat list of name-value pairs that are coded questions with coded or text answers, for example, to record categorical observations related to the subject of the measurement group. A single level of coded modifiers may be present.. What is the recommended way to add these values?

If I create a MeasurementsAndQualitativeEvaluations object containing QualitativeEvaluations of the sort

    text_item = QualitativeEvaluation(
        name=CodedConcept(
            value='121071',
            meaning='Finding',
            scheme_designator='DCM',
        ),
        value=CodedConcept(
            value=text,
            meaning='Finding',
            scheme_designator='DCM',
        )
    )

this will work (and pass SR validation) until the size of the text exceeds 16 characters (after that there is an exception in coding.py). This may also be considered an abuse of CodedConcept? Or perhaps there is a way to include aTextContentItem but I don't see the way to do this.

The use case is to permit free text additions to an SR by a radiologist for findings not present in the original SR generated by the model output. These text findings are at coded anatomic finding sites.

Set up doctests to catch errors in the examples

Various mistakes have been found in the user guide examples. Examples that don't work are very offputting for new users. We should set up automatic testing of the documentation examples to ensure they run correctly.

Provide meaningful error message if all Segmentation frames are empty

When using the static method of _omit_empty_frames, if there are no empty frames present (frames without positive pixels), then numpy thows an exception with regard to the empty non_empty_frames list passed into np.stack.

https://github.com/herrmannlab/highdicom/blob/95f39dd722ae6d357af3f942e2130d0ff4d68bfc/src/highdicom/seg/sop.py#L1037-L1078

May I suggest just a check on the array before proceeding?

        if len(non_empty_frames) == 0:
            return (pixel_array, plane_positions, [])

Segmentation TotalPixelMatrixRows/Columns are being saved as `float`

When constructing a Segmentation object I hit this block which results in a Segmentation object with non-integer TotalPixelMatrixRows and TotalPixelMatrixColumns.

plane_position_values[last_frame_index] is array([[ 1. , 1. , 18.719027 , 53.52480485, 0. ]]) and the row and column indices are 2 and 3 respectively. The resulting instance has TotalPixelMatrixRows: 152.719027 and TotalPixelMatrixColumns: 375.0.

LegacyConvertedEnhancedMRImage does not copy Private Tag Creator

Thanks for providing a converted for 'CLASSIC' MR Image Storage instances.

I am confused with the documentation for using this class, here is what I did (debian/buster):

$ pip3 install highdicom
$ mkdir /tmp/mr
$ cp gdcmData/*FileSeq* /tmp/mr
$ python3 conv.py
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'DICOMPrefix' used which is not in the element keyword data dictionary
  warnings.warn(msg)
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'FilePreamble' used which is not in the element keyword data dictionary
  warnings.warn(msg)
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'FrameVolumeBasedCalculationTechnique' used which is not in the element keyword data dictionary
  warnings.warn(msg)

Which resulted in (truncated):

    (0020,9171) SQ (Sequence with explicit length #=1)      # 568, 1 UnassignedPerFrameConvertedAttributesSequence
      (fffe,e000) na (Item with explicit length #=9)          # 560, 1 Item
        (0009,1015) ?? 30\33\33\53\36\39\4d\52\30\31\32\30\30\32\30\36\31\39\31\38\32\33... #  26, 1 Unknown Tag & Data
        (0019,1212) ?? 30\30\31\2e\31\38\32\36\39\36\45\2d\34\32 #  14, 1 Unknown Tag & Data
        (0020,0030) DS [-01.190625E+02\-1.190625E+02\-3.553830E+01] #  42, 3 RETIRED_ImagePosition
        (0020,0050) DS [003.553830E+01]                         #  14, 1 RETIRED_Location
        (0020,1041) DS [003.553830E+01]                         #  14, 1 SliceLocation
        (0021,1160) ?? 30\30\30\2e\30\30\30\30\30\30\45\2b\30\30\5c\30\30\2e\30\30\30\30... #  42, 1 Unknown Tag & Data
        (0021,1163) ?? 30\30\33\2e\35\35\33\38\33\30\45\2b\30\31 #  14, 1 Unknown Tag & Data
        (0021,1342) ?? 20\20\20\20\31\35                        #   6, 1 Unknown Tag & Data
        (0051,1010) ?? 30\30\33\37\32\37\36\5c\46\20\37\39\59\5c\48\2d\53\50\2d\43\52\5c... # 316, 1 Unknown Tag & Data
      (fffe,e00d) na (ItemDelimitationItem for re-encoding)   #   0, 0 ItemDelimitationItem

The output file is missing Private Creator for nested DataSet.

If using ExplicitVRLittleEndian, here is what I get:

$ dcmdump enh.dcm
W: DcmItem: Non-standard VR '  ' (0a\00) encountered while parsing element (0008,0005), assuming 2 byte length field
W: DcmItem: Non-standard VR 'IR' (49\52) encountered while parsing element (5349,5f4f), assuming 4 byte length field
E: DcmElement: Unknown Tag & Data (5349,5f4f) larger (536624) than remaining bytes in file
E: dcmdump: I/O suspension or premature end of stream: reading file: enh.dcm

Could you please add a section in the documentation on how to use LegacyConvertedEnhancedMRImage class ? Thanks much.

For reference:

$ cat conv.py
from pathlib import Path
from pydicom.filereader import dcmread
from pydicom import uid
from highdicom.legacy.sop import LegacyConvertedEnhancedMRImage

series_dir = Path('/tmp/mr')
image_files = series_dir.glob('*.dcm')

image_datasets = [dcmread(str(f)) for f in image_files]

enh = LegacyConvertedEnhancedMRImage(image_datasets, "1.2.3", "1", "4.5.6", "2")
# enh.file_meta.TransferSyntaxUID = uid.ExplicitVRLittleEndian
enh.save_as("enh.dcm")