imagingdatacommons / highdicom Goto Github PK
View Code? Open in Web Editor NEWHigh-level DICOM abstractions for the Python programming language
Home Page: https://highdicom.readthedocs.io
License: MIT License
High-level DICOM abstractions for the Python programming language
Home Page: https://highdicom.readthedocs.io
License: MIT License
This issue is because the generic Secondary Captures UID is no longer in the DICOM spec. However, they are still wildly used and thus should be supported.
Trying to generate a dicom seg file based on a multiple unsigned np.array, which is of shape (#_frames, #rows, #cols). The way I've approached the task was by generating dicom seg with the same series uid across all frames of the segmentation array. However, when displaying the output the 3 different delineated structures do not appear on a single frame (spatially they do not overlap).
Bellow is an example of the code that I've used to generate the dicom outputs.
Is it possible to show all the 3 structures in a single frame?
# img_dsets (list): list of pydicom.Datasets with lenght equal to the number of short-axis cine data
# seg_vol (np.array): Segmentation volume array with shape (#_frames, #rows, #cols)
import pydicom as pyd
import highdicom as hd
manufacturer_name = "dummy_manufacturer_name"
manufacturer_model_name = "dummy_model_name"
software_version = "0.1"
seg_labels = {
"myocardium": {"class_number": 1, "sct_code": pyd.sr.codedict.codes.SCT.LeftVentricleMyocardium},
"left_ventricle": {"class_number": 3, "sct_code": pyd.sr.codedict.codes.SCT.LeftVentricle},
"right_ventricle": {"class_number": 2, "sct_code": pyd.sr.codedict.codes.SCT.RightVentricle},
}
algo_details = hd.AlgorithmIdentificationSequence(
name=manufacturer_model_name,
version=software_version,
family=pyd.sr.codedict.codes.cid7162.ArtificialIntelligence,
source=manufacturer_name,
)
segment_descriptions = []
for label_name, label_info in seg_labels.items():
seg_details = hd.seg.SegmentDescription(
segment_number=label_info["class_number"],
segment_label=label_name,
segmented_property_category=pyd.sr.codedict.codes.cid7150.AnatomicalStructure,
segmented_property_type=label_info["sct_code"],
algorithm_type=hd.seg.SegmentAlgorithmTypeValues.AUTOMATIC,
algorithm_identification=algo_details,
tracking_uid=hd.UID(),
tracking_id=f"Cardiac structure #{class_number}",
)
segment_descriptions.append(seg_details)
series_uid = hd.UID()
for frame_idx in range(seg_vol.shape[0]):
seg_dset = hd.seg.Segmentation(
source_images=[img_dsets[frame_idx]],
pixel_array=seg_vol[frame_idx],
segmentation_type=hd.seg.enum.SegmentationTypeValues.BINARY,
segment_descriptions=segment_descriptions,
series_description="Segmentation-Test",
series_number=5,
series_instance_uid=series_uid,
sop_instance_uid=hd.UID(),
instance_number=int(frame_idx) + 1,
manufacturer=manufacturer,
manufacturer_model_name=manufacturer_model_name,
software_versions=software_version,
device_serial_number=str(img_dsets[frame_idx].DeviceSerialNumber),
)
seg_dsets.append(seg_dset)
Also, I've tried to generate a single dataset file for all frames at once, and with that approach the output data only shows a total number of frames equal to the # of classes, not iterating over all the frames.
I think this line should be catching AtrributeError
, not KeyError
exception: https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/legacy/sop.py#L361
I believe some part of the standard have not yet been implemented in the classic to enhanced conversion step:
The new Composite Instance shall contain the Contributing Equipment Sequence (0018,A001). If the source Composite Instances already contain the Contributing Equipment Sequence with a consistent set of Item values (excluding Contribution DateTime (0018,A002)), then a new Item shall be appended to the copy of the sequence in the new Composite Instance; if the source Composite Instance does not contain the Contributing Equipment Sequence or the Item values (excluding Contribution DateTime (0018,A002)) differ between source instances, then Contributing Equipment Sequence shall be created, containing one new Item. In either case, the new Item shall describe the equipment that is creating the new Composite Instance, and the Purpose of Reference Code Sequence (0040,A170) within the Item shall be (109106, DCM, "Enhanced Multi-frame Conversion Equipment") and the Contribution Description (0018,A003) shall be "Legacy Enhanced Image created from Classic Images", "Classic Image created from Enhanced Image", or "Updated UID references during Legacy Enhanced Classic conversion" as appropriate.
There are a few places where the highdicom API requests str
parameters and directly encodes them as attributes with value representation PN (person name). This includes but (but may not be limited to) the content_creator_name
parameter of the segmentation SOP constructor, and the verifying_observer_name
parameter of the EnhancedSR
, ComprehensiveSR
, and Comprehensive3DSR
constructors.
The format of a PN attribute is quite specific - you can't just enter free text here. See the PN entry in this table. Briefly, for human names there are five fields (family name, given name, middle name, name prefix, name suffix) that should be separated by caret characters (^
). See also the examples in the standard.
Unfortunately, no attempt is made at the pydicom level to enforce or check correct formatting. This propagates to highdicom. Therefore there is no checking or enforcement on these in highdicom, nor any documentation that there is even a format that should be followed. I suspect that the result is that the vast majority of users will pass "John Doe" instead of "Doe^John" and end up with incorrectly formatted attributes.
I consider these formatting details to be far lower level than users of highdicom should have to understand in order to create objects with correctly formatted PN attributes.
I am happy to work on a solution. Here are a few options that come to mind:
highdicom.content
module or a new highdicom.vr
module perhaps) with a constructor that takes the five parts of the name (family name, given name, middle name, name prefix, name suffix), any of which can be None, and has a method that returns the correctly formatted string. Then change the API of the various parts of the code expecting person names as string to instead expect PersonName
objects.@hackermd thoughts?
The Content Label attribute has Value Representation CS, which is restricted to [A-Z_0-9]
characters. This is currently implemented incorrectly in highdicom. In addition, pydicom doesn't complain if the value is not valid. Therefore, we will need to perform additional checks to ensure the values are valid.
While evaluating legacy sop classes, @afshinmessiah and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.
This raised the discussion with @dclunie below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:
@hackermd did you think about this?
From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper
Hi Andrey
I also copied Steve.
In short, probably option 2 (patch the source dataset, and then
do the conversion).
It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.
E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.
Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.
I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.
But if one can make a fix, it would be nice to.
There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:
ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf
http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2
In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).
I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".
In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.
In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.
This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.
This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.
When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.
It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.
BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.
David
PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.
PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.
On 2/19/20 11:18 PM, Andrey Fedorov wrote:
Hi David,
As Afshin is working on the MF conversion task, we wanted to ask you a
"fundamental" question.
As you know, TCIA datasets may often have errors. What should be the
strategy for addressing those? Should we:
- carry forward those errors into MF representation, and just ignore
those while validating MF?- patch the source dataset, and then do the conversion?
- patch the errors in the MF representation in the process of
conversion, and keep the originals intact?I would probably prefer option 3.
AF
In row 12 of TID 1501 it appears that text entries can be added to the object; the Content Item Description for this line states Allows encoding a flat list of name-value pairs that are coded questions with coded or text answers, for example, to record categorical observations related to the subject of the measurement group. A single level of coded modifiers may be present.
. What is the recommended way to add these values?
If I create a MeasurementsAndQualitativeEvaluations object containing QualitativeEvaluations of the sort
text_item = QualitativeEvaluation(
name=CodedConcept(
value='121071',
meaning='Finding',
scheme_designator='DCM',
),
value=CodedConcept(
value=text,
meaning='Finding',
scheme_designator='DCM',
)
)
this will work (and pass SR validation) until the size of the text exceeds 16 characters (after that there is an exception in coding.py). This may also be considered an abuse of CodedConcept
? Or perhaps there is a way to include aTextContentItem
but I don't see the way to do this.
The use case is to permit free text additions to an SR by a radiologist for findings not present in the original SR generated by the model output. These text findings are at coded anatomic finding sites.
Hi,
I'm unable to import highdicom after the installation. Here are some logs.
python3 -m pip install highdicom --ignore-installed
Defaulting to user installation because normal site-packages is not writeable
Collecting highdicom
Using cached highdicom-0.14.0-py3-none-any.whl (697 kB)
Collecting pylibjpeg-libjpeg>=1.2
Using cached pylibjpeg_libjpeg-1.3.0-cp38-cp38-macosx_11_0_arm64.whl (1.6 MB)
Collecting pydicom>=2.2.2
Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
Collecting pylibjpeg>=1.3
Using cached pylibjpeg-1.4.0-py3-none-any.whl (28 kB)
Collecting pillow-jpls>=1.0
Using cached pillow_jpls-1.1.0-cp38-cp38-macosx_11_0_arm64.whl (71 kB)
Collecting pylibjpeg-openjpeg>=1.1
Using cached pylibjpeg_openjpeg-1.2.0-cp38-cp38-macosx_11_0_arm64.whl (581 kB)
Collecting numpy>=1.19
Using cached numpy-1.22.1-cp38-cp38-macosx_11_0_arm64.whl (12.7 MB)
Collecting pillow>=8.3
Using cached Pillow-9.0.0-cp38-cp38-macosx_11_0_arm64.whl (2.7 MB)
Installing collected packages: pillow, numpy, pylibjpeg-openjpeg, pylibjpeg-libjpeg, pylibjpeg, pydicom, pillow-jpls, highdicom
Successfully installed highdicom-0.14.0 numpy-1.22.1 pillow-9.0.0 pillow-jpls-1.1.0 pydicom-2.2.2 pylibjpeg-1.4.0 pylibjpeg-libjpeg-1.3.0 pylibjpeg-openjpeg-1.2.0
python3
Python 3.8.9 (default, Oct 26 2021, 07:25:53)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import highdicom as hd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/__init__.py", line 4, in <module>
from highdicom import legacy
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/legacy/__init__.py", line 4, in <module>
from highdicom.legacy.sop import (
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/legacy/sop.py", line 18, in <module>
from highdicom.frame import encode_frame
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/highdicom/frame.py", line 6, in <module>
import pillow_jpls # noqa
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/__init__.py", line 2, in <module>
from .jpls_image_file import JplsImageFile, accept
File "/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/jpls_image_file.py", line 3, in <module>
from . import _pycharls
ImportError: dlopen(/Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so, 2): Symbol not found: __ZN3fmt2v87vformatENS0_17basic_string_viewIcEENS0_17basic_format_argsINS0_20basic_format_contextINS0_8appenderEcEEEE
Referenced from: /Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so
Expected in: flat namespace
in /Users/elamathis/Library/Python/3.8/lib/python/site-packages/pillow_jpls/_pycharls.cpython-38-darwin.so
>>>
Note: I tried completely uninstalling and installing the package but the problem persists.
Would this task be considered "in scope" for highdicom?
PixelOriginInterpetation is only required for Whole slide microscopy, see https://dicom.innolitics.com/ciods/extensible-sr/sr-document-content/00480301, but highdicom essentially makes it Type 1.
The problem with this is when defining region of interest for non-WSI images. It is not clear what is the meaning of VOLUME PixelInterpretationOrigin for CT, for example, since Total Pixel Matrix Origin seems to be WSI-specific. At the same time, it is impossible to define PixelInterpretationOrigin to be FRAME, since there is no frame number for non-enhanced images.
Most logical to me would be to allow passing None as pixel_origin_interpretation
parameter value to the ImageRegion
constructor, but instead of assigning VOLUME when None is passed (see https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/sr/content.py#L481) not including this attribute in the dataset.
At the moment, I can only initialize that attribute to VOLUME.
I'm getting
OSError: encoder error -2 when writing image file
when trying to create fractional segmentation images with transfer_syntax_uid=JPEG2000Lossless
I created an SR which contained qualitative evaluations, which are passed in as a sequence of CodeContentItems
to the qualitative_evaluations
parameter of the various SRs. I constructed these CodeContentItems
without the relationship_type
parameter, and highdicom did not issue any warnings or complaints. However the dciodvfy tool raises an error saying that the RelationshipType
is required and therefore the SR objects are invalid.
I am not sure exactly what the right level to solve this issue is.
The simplest way is simply to check the elements of qualitative_evaluations
when constructing MeasurementsAndQualitativeEvaluations
objects to ensure they have RelationshipType
s. However after thinking about this more I think that this specific issue may be a symptom of a broader problem. Currently the relationship_type
is an optional parameter for a ContentItem
and all its subclasses (e.g. CodeContentItem
, TimeContentItem
,NumContentItem
, ...). I'm not sure that this is correct. I think it should probably be required.
My reading of table C.17-6 is that RelationshipType is a type 1 (required) attribute within a ContentSequence
. I tried making relationship_type
a required parameter of ContentItem
and all subclasses, and the only tests that failed were tests where these objects were directly constructed without the relationship type parameter by test code. I.e. there is no place in the main codebase where an item is constructed without a relationship type (or at least not one with test coverage I suppose...). I haven't been able to think of a situation where a relationship type wouldn't be needed. The one place I'm unsure about is the very root of the content tree, which doesn't have a relationship to its parent. I'm not sure how to understand the recursive standard docs on this point. But even there it seems like highdicom is currently placing a RelationshipType
attribute at the root of the tree anyway
Another option is that maybe the append method of ContentSequence
should have a check to ensure that the ContentItem
being appended has a RelationshipType
.
@hackermd can you help please me understand at what level this issue exists?
This is because pydicom.Dataset.pixel_array returns a 2D rather than 3D array in this case. As a result, iter_segments ends up iterating through the rows of the single frame
Dear Sir,
The highdicom provides powerful DICOM SR and annotations API. But it seems to require integration with DICOM viewer.
As far as we know, Slim viewer doesn't include highdicom python package, correct? But it can generate corresponding annotations and SRs. Would you mind if I ask why Slim is not developed in python and then uses the highdicom package?
If we use the highdicom kit to develop the digital pathology web viewer and generate annotations and SRs at the same time, could you give some suggestions?
Thanks.
@hackermd one issue we identified with the current legacy converter is that it does not detect attributes that are repeated and identical in the PerFrame functional groups to factor them out into Shared functional groups (which I think is non-compliant with the standard, and is a bug).
We want to fix this, and @afshinmessiah was going to work on this. But before he starts working on it, we wanted to confirm this contribution would be welcomed. If it is, then we would also appreciate if you tell us if you have specific approach in mind how and where you would want it to be implemented, or if you would want us to come up with a proposal.
Hi,
Highdicom sets the attribute 'DICOMPrefix' and 'FilePreamble' which is not in the pydicom dicom directory. This results in a warning by default or an exception if pydicom config.INVALID_KEYWORD_BEHAVIOR = "RAISE". I dont think you need to set those attributes as they are written by pydicoms dcmwrite?
#139 added support for segmented LUTs in presentation states, however the user is left to themselves to construct valid lut data, which is not straightforward.
We should provide utility methods or alternative constructor to allow users to construct LUTs in a more intuitive way
Trying to use the constructor for the TimePointContext
object seems to result in an exception regardless of the input parameters.
from highdicom.sr.templates import TimePointContext
TimePointContext('whatever')
Results in:
Traceback (most recent call last):
File "test.py", line 3, in <module>
TimePointContext('whatever')
File "/Users/christopher.bridge/Developer/highdicom/src/highdicom/sr/templates.py", line 214, in __init__
self.append(time_point_item)
File "/Users/christopher.bridge/Developer/highdicom/src/highdicom/sr/value_types.py", line 139, in append
super(ContentSequence, self).append(item)
File "/Users/christopher.bridge/.pyenv/versions/highdicom/lib/python3.8/site-packages/pydicom/sequence.py", line 62, in append
super().append(val)
File "/Users/christopher.bridge/.pyenv/versions/highdicom/lib/python3.8/site-packages/pydicom/multival.py", line 66, in append
self._list.append(self.type_constructor(val))
AttributeError: 'TimePointContext' object has no attribute '_list'
I will keep thinking about it but for now this has me stumped. Any ideas appreciated.
I'm using pydicom version 2.1.2, python 3.8.7, highdicom master branch
I recently came across this SO question and as a solution, I answered suggesting the usage of this package. However, I noticed that since the attributes ICCProfile
and OpticalPathSequence
are not present, it's not possible to inspect the metadata of the file provided by the OP at all. Furthermore, since the code works correctly with the correct_color
option set to False in read_frame
, why not print a warning when reading metadata instead of raising the AttributeError
and raise it subsequently in the read_frame
function if correct_color
is True? I think that it's better to have the possibility to inspect, view and print all the available metadata rather than stop the code execution if one is missing. As a disclaimer, I'm not a DICOM expert so maybe there's a reason to do the things in this way that I can't actually understand.
Unify the API of the ParametricMap
class with the presentation state SOPClasses by having the user provide a VOILUTTransformation
object to the constructor. This is a backward incompatible change
Segmentation objects created by the library currently lack the following attributes:
which are needed to uniquely localize a segmentation image of a slide microscopy image within the slide-based coordinate system and are required by the standard (see Segmentation Image Module).
This line should say Pixel
, not Mixel
: https://github.com/MGHComputationalPathology/highdicom/blob/master/src/highdicom/legacy/sop.py#L363
Without correcting it, the resulting object does not have PixelMeasureSequence initialized, and is not valid. How were you able to load the output legacy enhanced objects into OsiriX? Or perhaps OsiriX does not care if PixelSpacing
is absent?
@afshinmessiah FYI
@hackermd would it be possible to set my permissions so that I can fork the repository and submit pull requests? I would prefer to contribute directly. Right now I have fork button disabled, which makes it impossible to submit pull requests:
Highdicom doesn't have support for TID 1604 (Image Library). This would be useful for encoding imaging attributes (imaging orientation, slice thickness, pixel spacing) for clients interpreting Comprehensive 2D SRs before the source image SOPInstances were retrieved.
I think if the user passes in either a single RWVM instance in the parametric map constructor, or passes a sequence with a single RWVM, we should add it to the Shared Functional Groups Sequence rather than forcing them into the Per-frame sequence.
@hackermd I can take care of this if you would like, let me know.
Which SOPInstanceUID is the key pet-image
supposed to link to? I don't see a pet image storage on the official list.
Is it referring to Positron Emission Tomography Image Storage SOP 1.2.840.10008.5.1.4.1.1.128?
If so the key should be changed to positron-emission-tomography-image
If this isn't the issue what key in IOD_MODULE_MAP
does SOP 1.2.840.10008.5.1.4.1.1.128 map to?
Dear Sir,
As we know, the High DICOM support
But we are not sure whether High DICOM can support Supplement 222 Microscopy Bulk Simple Annotations Storage SOP Class.
This Supplement to the DICOM Standard specifies a new DICOM Information Object and Storage SOP Class for storing Microscopy Bulk Simple Annotations (points, open polylines, closed polygons and simple geometric shapes without relationships), which is referred to as the Microscopy Bulk Simple Annotations IOD.
Microscopy Bulk Simple Annotations are usually created by machine algorithms from high resolution images of entire tissue sections, e.g., encoded as DICOM Whole Slide Microscopy images.
If High DICOM can support, it will be easily used after feeding data into AI.
We are investigating the use of highdicom for converting our NIfTI segmentation outputs to DICOM SEG format. The corresponding DICOM series have some of the fields anonymised. For instance SOPClassUID.
It seems highdicom is expecting a predefined set of UIDs. Is there possibility of supporting also unknown fields as well?
Here is the error message I am getting;
traceback (most recent call last):
File "convert.py", line 75, in <module>
seg_dataset = Segmentation(
File "/../lib/python3.8/site-packages/highdicom/seg/sop.py", line 422, in __init__
self.copy_specimen_information(src_img)
File "/../lib/python3.8/site-packages/highdicom/base.py", line 285, in copy_specimen_information
self._copy_root_attributes_of_module(dataset, 'Image', 'Specimen')
File "/../lib/python3.8/site-packages/highdicom/base.py", line 240, in _copy_root_attributes_of_module
iod_key = SOP_CLASS_UID_IOD_KEY_MAP[dataset.SOPClassUID]
KeyError: '05703841015643452503209109170649121759226008273168'
When using the static method of _omit_empty_frames
, if there are no empty frames present (frames without positive pixels), then numpy thows an exception with regard to the empty non_empty_frames
list passed into np.stack
.
May I suggest just a check on the array before proceeding?
if len(non_empty_frames) == 0:
return (pixel_array, plane_positions, [])
As implemented, it appears that legacy converter does not sort frames geometrically. Arguably, it should be the responsibility of the reader to sort them, but I wanted to confirm this was a deliberate decision not to sort them. @hackermd can you comment?
As aside, Slicer does not sort frames while reading resulting multiframe (tested with the 2020-01-12 nightly):
Spoke with @hackermd about this earlier. I am trying to save a feature vector for each frame of a SM image as a segmentation which ends up creating 1024 segments. I pinned the slowness down to these lines.
I am trying to save the segmentation of a CXR as a dicomseg.
seg_dataset = Segmentation(
source_images=[source_image],
pixel_array=np.uint16(seg_img),
segmentation_type=SegmentationTypeValues.FRACTIONAL,
segment_descriptions=description_segments,
series_instance_uid=generate_uid(),
sop_instance_uid=generate_uid(),
instance_number=instance_number,
manufacturer="deepc",
manufacturer_model_name=config["NAME"],
software_versions="v"+config["VERSION"],
device_serial_number="Device XYZ",
series_number=2,
fractional_type="OCCUPANCY",
)
Even though I am using a proper DX
dcm. I am getting errors where it says some fields like StudyID
, SliceThickness
and PixelSpacing
are missing in the source_image
. Is it possible to skip the missing fields? or is there a better way to handle such scenarios?
It looks like the above method calls spatial.map_pixel_into_coordinate_system()
with its outdated function signature.
It appears that there are a number of failed tests with the same issue.
When a fractional segment is added with a float type (i.e. truly fractional segments, not binary segments encoded as fractional), the segment number is always set to 1. This means that if a second segment is added, both will have segment number 1, which I assume is not allowed...I have added a check to prevent this in my feature/more_seg_tests branch, but this makes adding multiple fractional segments will be impossible.
I suggest adding an option to allow the user to explicitly specifiy the segment number to get around this issue.
As a side effect, this would also allow adding binary masks as different segments, where currently it would be necessary to make sure that the true pixels' values match the segment number.
Thoughts @hackermd ?
I'm happy to have a shot at making alterations, but wanted your thoughts first
When constructing a Segmentation object I hit this block which results in a Segmentation object with non-integer TotalPixelMatrixRows and TotalPixelMatrixColumns.
plane_position_values[last_frame_index]
is array([[ 1. , 1. , 18.719027 , 53.52480485, 0. ]])
and the row and column indices are 2 and 3 respectively. The resulting instance has TotalPixelMatrixRows: 152.719027
and TotalPixelMatrixColumns: 375.0
.
In the next pydicom release, storage class UIDs will become importable from pydicom.uid
(see https://github.com/pydicom/pydicom/blob/master/pydicom/_storage_sopclass_uids.py) and the private _storage_class_uid.py
module will be deprecated. We should consider updating highdicom once pydicom 2.3 is out.
In the list of iods ( _iod.py), "overlay-plane" is listed as a module for pet-image. In _module.py file though, there is no such module as "overlay-plane" to get the list of its attributes.
Currently, the None
type is omitted from type annotations in the docstrings.
As discussed in #75 (comment), we should consistently use Union[..., None], optional
to clearly indicate that None
is a valid option for these parameters.
from typing import Optional
def foo(bar: int, optional_arg: Optional[str] = None) -> None:
"""
Parameters
-----------
bar: int
A required parameter
optional_arg: Union[str, None], optional
An optional parameter
"""
I just realized that Person Observer Name is currently incorrectly encoded using TextContentItem
instead of PnameContentItem
(see highdicom.sr.templates.PersonObserverIdentifyingAttributes).
It is my understanding that the PertinentOtherEvidenceSequence should contain all instances referenced anywhere in the content tree, and failing to do so would technically be non-compliant. See sect_C.17.2.3
[evidence in PertinentOtherEvidenceSequence] shall include, but is not limited to, all current evidence referenced in the content tree
When constructing an SR object, highdicom has all the available information to ensure that this has been done correctly (i.e. no referenced instance in the content tree has been omitted from the evidence
parameter), but it currently does not do so.
I would suggest that one improvement could be to add code to the SR constructor that walks the content tree and raises an exception if any evidence is missing. Unfortunately I think it will not be possible to automatically add the evidence as some information (such as series instance uid) may be missing from the references in the content tree.
Thoughts? @hackermd
Upon construction of some content items, we first add an empty ContentSequence
to the object and then later append individual items to it. If, however, for some reason no items get subsequently added, the ContentSequence
attribute remains empty, which is not allowed by the standard.
Generation of Segmentation (SEG) images doesn't work for mammography images due to required tags that are optional and usually not present. In particular: FrameOfReferenceUID, SliceThicknes, ImageOrientationPatient and ImagePositionPatient. I'm attaching an exemplary mammography DICOM file.
Thanks for considering it!
Various mistakes have been found in the user guide examples. Examples that don't work are very offputting for new users. We should set up automatic testing of the documentation examples to ensure they run correctly.
Latest release of pydicom is causing several tests to fail. At first glance all errors appear to be related to a new behaviour for handling multivalues. It may be difficult to support versions both sides of the change. We may want to disallow 2.3.0 as an interim measure
Hi,
Im using highdicom 0.9.0 to create a SpecimenDescription. When I try to create SpecimenCollection, SpecimenStaining, or SpecimenSampling ContentSequences for a SpecimenPreparationStep:
from highdicom.content import SpecimenCollection, SpecimenStaining, SpecimenSampling
from highdicom.sr.coding import CodedConcept
import highdicom
SpecimenCollection(CodedConcept('P1-03000', 'SRT', 'Excision'))
SpecimenStaining(
[
CodedConcept('C-22968', 'SRT', 'hematoxylin stain'),
CodedConcept('C-22919', 'SRT', 'water soluble eosin stain')
]
)
SpecimenSampling(
method = CodedConcept('111727', 'SRT', 'Dissection with representative sections submission'),
parent_specimen_id = 'specimen id',
parent_specimen_type = CodedConcept('G-8300', 'SRT', 'Tissue specimen')
)
I get the following error (similar for SpecimenStaining and SpecimenCollection):
AttributeError: Items to be appended to a SpecimenSampling must have an established relationship type.
The same error appears in 0.8.0 but not in 0.7.0.
I'm writing a script to attach AI made lesion segmentations to existing DICOM files (It's my first time that I dive so deep into DICOM, so I might have missed something).
I have found your repository and used hd.seg.Segmentation
, but unfortunately the order of the segmentation slices was incorrect.
First I tried to use just a dummy segmentation (simple 3D box over an MRI), and after encountering the issue I decided to try another approach:
I took an existing DICOM file (of a CT scan) that contains both a scan and a segmentation, converted the segmentation to np.array, and used is as the mask
input, while the scan folder is the source_images
.
From the original segmentation file (9 slices between -224.5mm to -192.5mm, 4mm thickness):
(0020,0032) DS [-189.136\-320.136\-224.5] # 24, 3 ImagePositionPatient
(0020,0032) DS [-189.136\-320.136\-220.5] # 24, 3 ImagePositionPatient
:
(0020,0032) DS [-189.136\-320.136\-192.5] # 24, 3 ImagePositionPatient
Versions:
Python 3.8.10
numpy 1.21.2
pydicom 2.2.1
highdicom 0.10.0
Code snippets:
:
# Collection the scan images
image_datasets = [dcmread(str(f)) for f in image_files]
# Creating a segmentation mask from the existing segmentation. shape = (108, 512, 512)
mask = np.zeros(
shape=(
len(image_datasets),
image_datasets[0].Rows,
image_datasets[0].Columns
),
dtype=bool
)
mask[86:95, :, :] = np.array(hd.seg.segread(
'/home/ben/.../original_seg.dcm'
).pixel_array, dtype=bool)
# Printing to check if the order is correct
sl_num = -568.5
for slice in mask:
print(sl_num, np.unique(slice))
sl_num += 4
Output:
-568.5 [False]
-564.5 [False]
-560.5 [False]
:
-236.5 [False]
-232.5 [False]
-228.5 [False]
-224.5 [False True]
-220.5 [False True]
-216.5 [False True]
-212.5 [False True]
-208.5 [False True]
-204.5 [False True]
-200.5 [False True]
-196.5 [False True]
-192.5 [False True]
-188.5 [False]
-184.5 [False]
-180.5 [False]
:
-148.5 [False]
-144.5 [False]
-140.5 [False]
Creating the segmentation:
# Get meta-data information from the existing series
series_instance_uid = image_datasets[0].SeriesInstanceUID
series_number = image_datasets[0].SeriesNumber
sop_instance_uid = image_datasets[0].SOPInstanceUID
instance_number = image_datasets[0].InstanceNumber
MANUFACTURER = 'Ben' # image_datasets[0].Manufacturer
MANUFACTURER_MODEL_NAME = "Prost" # image_datasets[0].ManufacturersModelName
SOFTWARE_VERSIONS = 'v1.0'
DEVICE_SERIAL_NUMBER = '0' # image_datasets[0].DeviceSerialNumber
# Describe the algorithm that created the segmentation family:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7162.html
algorithm_identification = hd.AlgorithmIdentificationSequence(
name=MANUFACTURER_MODEL_NAME,
version=SOFTWARE_VERSIONS,
family=codes.cid7162.ArtificialIntelligence
)
# Describe the segment:
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.SegmentDescription
# segmented_property_category:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7150.html
# segmented_property_type:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7160.html
description_segment = hd.seg.SegmentDescription(
segment_number=1,
segment_label='Lesions',
segmented_property_category=codes.cid7150.AnatomicalStructure,
segmented_property_type=codes.cid7160.Prostate,
algorithm_type=hd.seg.SegmentAlgorithmTypeValues.AUTOMATIC,
algorithm_identification=algorithm_identification,
tracking_uid=hd.UID(),
tracking_id='Lesion Segmentation of a Prostate MR Image',
# anatomic_regions=Code("41216001", "SCT", "Prostate"), # BA - error, seems like the others are enough
)
# Create the Segmentation instance
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.Segmentation
seg_dataset = hd.seg.Segmentation(
source_images=image_datasets,
pixel_array=mask,
segmentation_type=hd.seg.SegmentationTypeValues.BINARY, # FRACTIONAL,
segment_descriptions=[description_segment],
series_instance_uid=series_instance_uid, # hd.UID(),
series_number=series_number,
sop_instance_uid=sop_instance_uid, # hd.UID(),
instance_number=instance_number,
manufacturer=MANUFACTURER,
manufacturer_model_name=MANUFACTURER_MODEL_NAME,
software_versions=SOFTWARE_VERSIONS,
device_serial_number=DEVICE_SERIAL_NUMBER,
omit_empty_frames=True,
# content_creator_name=manufacturer,
)
# Compare generated and original segmentations
print('seg:')
for i, slice in enumerate(seg_dataset.PerFrameFunctionalGroupsSequence):
print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(seg_dataset.pixel_array[i]))
ref_path = '/home/ben/.../original_seg.dcm'
ref_seg = hd.seg.segread(ref_path)
print('\noriginal:')
for i, slice in enumerate(ref_seg.PerFrameFunctionalGroupsSequence):
print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(ref_seg.pixel_array[i]))
And the outputs:
seg:
-564.5 [0 1]
-552.5 [0 1]
-520.5 [0 1]
-488.5 [0 1]
-404.5 [0 1]
-328.5 [0 1]
-324.5 [0 1]
-244.5 [0 1]
-200.5 [0 1]
original:
-224.5 [0 1]
-220.5 [0 1]
-216.5 [0 1]
-212.5 [0 1]
-208.5 [0 1]
-204.5 [0 1]
-200.5 [0 1]
-196.5 [0 1]
-192.5 [0 1]
As seen, the newly generated segmentation got wrong PlanePositionSequence
. The same happens when I use omit_empty_frames=False
.
I'm still reading through the library source code, found another bug and told @hackermd about it but it was not related.
Had a thought about adding a reorder by original position function, to make sure the segmentation is being mapped correctly in case the original scan files are not in order.
Any other thoughts or comments?
Many thanks to all of you anyway, it's a really impressive library and I'm glad you've released it exactly when I started working on this project :)
TID 1501 allows to capture qualitative evaluations assigned per image (without specifying any annotation) - see rows 10b/11b in https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_A.html#sect_TID_1501.
This does not appear to be possible using highdicom - I don't see how I could specify image in the constructor of TID1501: https://github.com/herrmannlab/highdicom/blob/master/src/highdicom/sr/templates.py#L2420. Am I correct, or I am missing something?
There is no syntax highlighting for the code examples on read the docs https://highdicom.readthedocs.io/en/latest/usage.html
However, when html files are built locally with the makefile, syntax highlighting is working fine
This has been discussed before (e.g. here) but we should have an issue to track it.
Pydicom is very loose in what it allows you to set as an attribute's value, even when you have the global configuration option pydicom.config.enforce_valid_values
set to True
. We have previously encountered and resolved this narrowly for decimal strings (DS) #57 #65, but the issue is broader. Checks for the other VRs are largely absent from pydicom, but many VRs have limits on length of the string, list of allowable characters, capitalisation, etc (see standard). The result is many one-off checks being including to check user-supplied values in highdicom, as well as probably many missed checks that could allow files with invalid values to be produced. We should tackle this in a more unified way to reduce redundancy and probability of invalid values slipping through the net.
My feeling is that as far as possible we should add this functionality to pydicom and then integrate into highdicom.
Thanks for providing a converted for 'CLASSIC' MR Image Storage instances.
I am confused with the documentation for using this class, here is what I did (debian/buster):
$ pip3 install highdicom
$ mkdir /tmp/mr
$ cp gdcmData/*FileSeq* /tmp/mr
$ python3 conv.py
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'DICOMPrefix' used which is not in the element keyword data dictionary
warnings.warn(msg)
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'FilePreamble' used which is not in the element keyword data dictionary
warnings.warn(msg)
/home/mathieu/.local/lib/python3.7/site-packages/pydicom/dataset.py:1981: UserWarning: Camel case attribute 'FrameVolumeBasedCalculationTechnique' used which is not in the element keyword data dictionary
warnings.warn(msg)
Which resulted in (truncated):
(0020,9171) SQ (Sequence with explicit length #=1) # 568, 1 UnassignedPerFrameConvertedAttributesSequence
(fffe,e000) na (Item with explicit length #=9) # 560, 1 Item
(0009,1015) ?? 30\33\33\53\36\39\4d\52\30\31\32\30\30\32\30\36\31\39\31\38\32\33... # 26, 1 Unknown Tag & Data
(0019,1212) ?? 30\30\31\2e\31\38\32\36\39\36\45\2d\34\32 # 14, 1 Unknown Tag & Data
(0020,0030) DS [-01.190625E+02\-1.190625E+02\-3.553830E+01] # 42, 3 RETIRED_ImagePosition
(0020,0050) DS [003.553830E+01] # 14, 1 RETIRED_Location
(0020,1041) DS [003.553830E+01] # 14, 1 SliceLocation
(0021,1160) ?? 30\30\30\2e\30\30\30\30\30\30\45\2b\30\30\5c\30\30\2e\30\30\30\30... # 42, 1 Unknown Tag & Data
(0021,1163) ?? 30\30\33\2e\35\35\33\38\33\30\45\2b\30\31 # 14, 1 Unknown Tag & Data
(0021,1342) ?? 20\20\20\20\31\35 # 6, 1 Unknown Tag & Data
(0051,1010) ?? 30\30\33\37\32\37\36\5c\46\20\37\39\59\5c\48\2d\53\50\2d\43\52\5c... # 316, 1 Unknown Tag & Data
(fffe,e00d) na (ItemDelimitationItem for re-encoding) # 0, 0 ItemDelimitationItem
The output file is missing Private Creator for nested DataSet.
If using ExplicitVRLittleEndian, here is what I get:
$ dcmdump enh.dcm
W: DcmItem: Non-standard VR ' ' (0a\00) encountered while parsing element (0008,0005), assuming 2 byte length field
W: DcmItem: Non-standard VR 'IR' (49\52) encountered while parsing element (5349,5f4f), assuming 4 byte length field
E: DcmElement: Unknown Tag & Data (5349,5f4f) larger (536624) than remaining bytes in file
E: dcmdump: I/O suspension or premature end of stream: reading file: enh.dcm
Could you please add a section in the documentation on how to use LegacyConvertedEnhancedMRImage class ? Thanks much.
For reference:
$ cat conv.py
from pathlib import Path
from pydicom.filereader import dcmread
from pydicom import uid
from highdicom.legacy.sop import LegacyConvertedEnhancedMRImage
series_dir = Path('/tmp/mr')
image_files = series_dir.glob('*.dcm')
image_datasets = [dcmread(str(f)) for f in image_files]
enh = LegacyConvertedEnhancedMRImage(image_datasets, "1.2.3", "1", "4.5.6", "2")
# enh.file_meta.TransferSyntaxUID = uid.ExplicitVRLittleEndian
enh.save_as("enh.dcm")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.