imi-bigpicture / wsidicom Goto Github PK
View Code? Open in Web Editor NEWPython package for reading DICOM WSI file sets.
License: Apache License 2.0
Python package for reading DICOM WSI file sets.
License: Apache License 2.0
We have some dicomized testdata (used the wsidicomizer in early 2023 to dicomize CMU-1.svs) that can't be opened anymore using wsidicom>=0.18.0, showing the following error:
AttributeError: 'Dataset' object has no attribute 'XOffsetInSlideCoordinateSystem'
It seems that the old dicomized testdata has some missing data. Are there some mandatory dicom tags that need to be added to the data?
DICOM files generated by, for example, https://github.com/GoogleCloudPlatform/wsi-to-dicom-converter are not accepted by wsidicom due to missing tags (e.g. OpticalPath-related), but can be opened by Orthanc. Maybe adding a flag --strict would be better? If you can read tags and the actual image data I think that should be enough to be acceptable.
Add support and/or document how to best validate that files follow the official BigPicture guidelines, supporting the scenario where you want to harmonize your dicom data:
if not WsiDicom.ready_for_viewing(folder_path):
try:
WsiDicom.open(folder_path).save(new_folder_path)
except:
raise Error("Dicom files was not ready for viewing, and it was not possible to convert it")
...
image = WsiDicom.open(folder_path, strict_mode=True)
Basic slide properties, such as size (width and height) of base layer and tile size, should be available.
Currently the save()-method will save DICOM files directly into the given output folder. This can give unexpected problems if a user saves multiple WSIs to the same folder, as it will then be difficult to figure out which files belongs to which WIS.
It could therefore be better if the files were saved in a subfolder to the given output folder.
The NCI's Imaging Data Commons is a big repository (>38k studies) for cancer research.
Their studies are available via DICOMweb. See here for an example of viewing one of them.
I tried to access that same example, but I get a couple of HTTP errors. Try out the following code:
from wsidicom import WsiDicom, WsiDicomWebClient
# For this one: https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.227261840503961430496812955999336758586/series/1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0
url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'
client = WsiDicomWebClient.create_client(url)
slide = WsiDicom.open_web(client, study_uid, series_uid)
It produces the following exception:
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID
I played around with that url with curl
and found a couple of errors. Accessing that full URL via this command:
curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID'
Produces:
[{
"error": {
"code": 400,
"message": "invalid QIDO-RS query: unknown/unsupported QIDO attribute: AvailableTransferSyntaxUID",
"status": "INVALID_ARGUMENT"
}
}
]
So it is raising an exception because we asked for the AvailableTransferSyntaxUID
. However, if I remove that part of the url:
curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'
It raises another error:
[{
"error": {
"code": 400,
"message": "generic::invalid_argument: SOPClassUID is not a supported instance or series level attribute",
"status": "INVALID_ARGUMENT"
}
}
]
So it is also complaining that we are asking for a SOPClassUID of WSI_SOP_CLASS_UID
in the search filters.
If I remove that part of the url also:
curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'
It works fine.
It's kind of annoying that an error is being raised for AvailableTransferSyntaxUID
. It would be nice if it just returned an empty field if it was not available.
However, it would be really nice if we could support interacting with this DICOMweb server.
Let me know what your thoughts are, @erikogabrielsson.
As sup 222 is approved the implementation should be updated to include correct uid and tags.
read_region_mm() currently maps physical distance to pixels with same origin and orientation as the image pixels. Graphical annotations are often done with the frame of reference as origin, and to get the corresponding region it would be helpful if read_region_mm could (optionally) map to the frame of reference (and orientation).
I've got a couple of large DICOM WSI images where calling WsiDicom.open
is VERY slow. I've tracked it down to read_dataset
in the init which then calls filereader.py:41(data_element_iterator)
then filereader.py:461(read_sequence)
. This file has nearly 180,000 frames, which could be why this is so slow. However, there must surely be a faster way to initialize.
I am on wsidicom 0.4.0, pydicom 2.3.1 and python 3.11.
For sparse files a PlanePositionSlideSequence in PerFrameFunctionalGroupsSequenceis expected to give the positions of the frames, but for some files with only one frame this instead in the SharedFunctionalGroupsSequence.
For files that has non-supported transfer syntax (not supported by pillow handler, e.g. non-jpeg) pixel data is parsed in the wrong way and errors are raised. Pixel data should only be parsed for supported transfer syntaxes.
This is recommended, per https://peps.python.org/pep-0396/, but not available in wsidicom
currently.
It should be possible to open a DICOM WSI based on the information in a dicom dir file.
The DICOM file writer can either write basic or extended (or no) offset table. Basic offset table (BOT) can only index image files up to some 4 GB (due to the 32 bit unsigned integers used for indexing). Currently using BOT will fail after writing the file if the size is exceeded. An EOT should be used for larger files.
BOT and EOT are both places before the image data, but can only be written after all the image data has been written. Currently space is thus reserved for either BOT or EOT before writing the image data (and then filled in after writing the image data).
It would be nice if the writer would automatically select the best offset table to use based on the image data to write. When writing from DICOM this should be relative easy as there is no re-encoding. When the writer is used in wsidicomizer to convert non-DICOM files, the size to write can be harder to estimate as for example jpeg headers are added or the image data is re-encoded.
An simpler alternative could be to, if the file becomes to large for BOT, copy the unfinished file but use an EOT. As the image data would already be correctly prepared, this could be a simple and quick operation.
I used the add_missing_levels option to recreate the pyramid structure of DICOM file but it did not recreate the pyramid structure.
If I look into the code, in this line:
https://github.com/imi-bigpicture/wsidicom/blob/90671211ddb87096d19952d5e80a7199dda51b67/wsidicom
/file/wsidicom_file_target.py#LL79C1-L79C44
it seems that new_levels is set to "None" and then it never enters the "if" check later on.
Version 0.9.0 will not install via conda/mamba (conda-forge channel) and python 3.11 because the constraint on dicomweb-client are >=3.9, <3.10.a0. Previous versions work fine with Python 3.11.
There are a couple of options for supporting writing and reading annotations in DICOM that we could use. The main options are:
Which option that will work best will likely depend on what we need to annotate.
Using what is supported in (the current) 222-annotations, we could make annotations like this:
# Load a slide
slide = WsiDicom.open('path_to_dicom_dir')
# Create a point annotation at x=10.0, y=20.0 mm
# Geometries are limited to: Point, open and closed Line, Ellipse, Rectangle
point_annotation = Annotation(Point(10.0, 20.0))
# Create a point annotation with a measurement
measurement = Measurement('Area', 25.0, 'Pixels')
point_annotation_with_measurment = Annotation(Point(10.0, 20.0), [measurement])
# Create a group of the annotations
# The annotations in a group are required to be of same geometry type.
annotations = [point_annotation, point_annotation_with_measurment]
category_code = CategoryCode('Tissue')
type_code = TypeCode('Nucleus')
group = PointAnnotationGroup(annotations, 'group label', category_code, type_code, 'description')
# Create a collection of annotation groups
collection = AnnotationCollection([group], slide.frame_of_reference)
# Save the collection to file
collection.save('path_to_dicom_dir/annotation.dcm')
# Load the slide again
slide = WsiDicom.open('path_to_dicom_dir')
# Access the annotations
group = slide.annotations[0]
There are some additional attributes that needs/could be set, such as algorithm identification etc.
If using 222-annotations is to restrictive, using structured reports is an option.
One mayor hurdle is how to store annotations with openings/holes, as there is no standard way of doing that in DICOM at the moment.
Support for segmentation/heat maps is likely needed. Segmentation, using either binary (0 or 1) or fractional values are supported in the segmentation IOD.
How will segmentation map look like? Will most of the slide (the part with specimen) be segmented or only certain regions/tiles?
Should the segmentation be based on physical dimensions (mm in slide) or pixels? Using mm, it is more straightforward to use the same segmentation map at different pyramidal levels of the slide, but pixel dimensions can of coarse be re-calculated to other levels knowing the pixel spacing of the segmentation. Is pixel-perfect segmentation needed?
Do we need to use pyramidal levels also for the segmentation maps?
How will the support for segmentation look like in Cytomine? @geektortoise @rmaree @urubens
For the following example, it looks to me that the tile width and height are reversed:
from wsidicom import WsiDicom, WsiDicomWebClient
url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.16.756.5.41.446.20190905094928'
series_uid = '2.16.756.5.41.446.20190905094928.20190905102252'
client = WsiDicomWebClient.create_client(url)
slide = WsiDicom.open_web(client, study_uid, series_uid)
size = (1000, 900)
print(f'{size=}')
tile = slide.read_region((0, 0), 0, size)
print(f'{tile.width=} {tile.height=}')
It prints out the following:
size=(1000, 900)
tile.width=900 tile.height=1000
If I look into the API for read_region
, it looks like the size
parameter is specified via (width, height)
. However, the returned tile appears to have the width and height reversed.
We are currently performing some tests with a DICOMweb server that has some examples in Explicit VR Little Endian
and JPEG-LS
format. It would be nice if we could support as many formats as possible in wsidicom
.
If we have a transfer syntax that Pillow
doesn't support, what do you think about relying on pydicom's pixel_data_handlers to first convert the bytes to a numpy array, and then convert it to Pillow
via Image.fromarray()
?
Those pixel_data_handlers
currently do not support decoding individual frames. However, there is a PR up for such support, so it will hopefully be supported in the future. In the meantime, highdicom
includes a decode_frame()
function that takes as arguments some of the attributes on the DICOM
dataset. It then creates a "fake" DICOM dataset and utilizes pydicom's pixel_data_handlers
to convert the frame to a numpy array. highdicom
is planning to deprecate and remove this function when pydicom
starts supporting frame decoding. We could potentially add highdicom
as an optional dependency for this functionality until pydicom
supports it.
What do you think?
There is a need to be able to save the wsi object to file for various reasons:
For conversion/saving, it is likely best to be able to this on a tile-by-tile basis whenever possible. We could then use another library, that can open and provide tiles (without transcoding) from the wsi, and use this to build the DICOM compatible pyramid. As we don't need to do any decompression/compression, the conversion will be fast and lossless.
For formats that are not native tiled, we to also need to tile the image data. We might be able to also do this lossless, but it will be more complicated.
Converting image data and adding the associated technical metadata is likely the easier part of converting to DICOM. We also need to add required DICOM data such as:
Although we will likely support opening of DICOM WSI files of different types (e.g. sparse of full tiled, concatenated or single file) per level it is probably best to limit the write support to what we think is "best practice" (full tiled, single file per level?).
For one of my examples, I am encountering the following error:
File "/opt/wsidicom/wsidicom/wsidicom.py", line 153, in open_web
source = WsiDicomWebSource(
File "/opt/wsidicom/wsidicom/web/wsidicom_web_source.py", line 97, in __init__
annotation_instance = AnnotationInstance.open_dataset(instance)
File "/opt/wsidicom/wsidicom/graphical_annotations.py", line 1688, in open_dataset
if dataset.file_meta.MediaStorageSOPClassUID != ANN_SOP_CLASS_UID:
File "/opt/venv/lib/python3.9/site-packages/pydicom/dataset.py", line 907, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'file_meta'
Is the file_meta
attribute unavailable for instances obtained via DICOMweb?
Here is a reproducer of the issue:
from dicomweb_client import DICOMwebClient
from pydicom.datadict import tag_for_keyword
from pydicom.tag import Tag
from wsidicom import WsiDicom, WsiDicomWebClient
url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.25.18199272949575141157802058345697568861'
series_uid_tag = Tag(tag_for_keyword('SeriesInstanceUID')).json_key
client = DICOMwebClient(url)
series = client.search_for_series(study_uid)
series_uids = [x[series_uid_tag]['Value'][0] for x in series]
wsi_client = WsiDicomWebClient(client)
slide = WsiDicom.open_web(
wsi_client,
study_uid,
series_uids,
)
Is there a reason Pillow is pinned to version 9.x? Can it be unpinned to allow Pillow 10? I think this is just changing https://github.com/imi-bigpicture/wsidicom/blob/main/pyproject.toml#L23
Thank you.
Do you know why this error is occurring? Here is a reproducer:
from wsidicom import WsiDicom, WsiDicomWebClient
url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '1.3.6.1.4.1.5962.99.1.2447135355.1068687300.1625944806011.3.0'
series_uid = '1.3.6.1.4.1.5962.99.1.2447135355.1068687300.1625944806011.2.0'
client = WsiDicomWebClient.create_client(url)
slide = WsiDicom.open_web(client, study_uid, series_uid)
slide.read_region((1920, 1920), 4, (1067, 1017))
I get this error:
Traceback (most recent call last):
File "/home/patrick/test.py", line 11, in <module>
slide.read_region((1920, 1920), 4, (1067, 1017))
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/wsidicom.py", line 400, in read_region
image = wsi_level.get_region(scaled_region, z, path, threads)
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/group/group.py", line 318, in get_region
image = instance.image_data.stitch_tiles(region, path, z, threads)
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 525, in stitch_tiles
self._paste_tiles(
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 576, in _paste_tiles
pool.map(thread_paste, tile_region.chunked_iterate_all(threads))
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/thread.py", line 41, in map
return (item for item in list(map(fn, *iterables)))
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 568, in thread_paste
for tile_point, tile in zip(
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/web/wsidicom_web_image_data.py", line 102, in get_decoded_tiles
yield Image.open(io.BytesIO(frame))
File "/home/patrick/virtualenvs/histomics/lib/python3.9/site-packages/PIL/Image.py", line 3298, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f4aadc35680>
Note that this example contains uncompressed data.
For our DICOMweb example that we have been using, each series only supports one transfer syntax: the syntax that the pixel data is encoded in. We can obtain this information by requesting for the field AvailableTransferSyntaxUID
in a query to the DICOMweb server.
Perhaps in wsidicom.open_web()
, we should allow None
to be passed as the requested_transfer_syntax
. In that case, we perform a query to the DICOMweb server to see which transfer syntaxes that series supports. If multiple transfer syntaxes are supported, we would prefer the syntaxes that we can more easily work with (such as syntaxes that Pillow
can read).
What do you think?
If I run the following example:
from wsidicom import WsiDicom, WsiDicomWebClient
url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.16.756.5.41.446.20190905094928'
series_uid = '2.16.756.5.41.446.20190905094928.20190905102252'
client = WsiDicomWebClient.create_client(url)
slide = WsiDicom.open_web(client, study_uid, series_uid)
I get this error:
ERROR:root:Failed to parse LUT
Traceback (most recent call last):
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 106, in from_ds
return cls(ds.PaletteColorLookupTableSequence)
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 62, in __init__
self.table = self._parse_lut(self._lut_item)
File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 156, in _parse_lut
parsed_tables[color] = self._parse_color(table)
ValueError: could not broadcast input array from shape (65536,) into shape (0,)
That example is located here.
Missing a comma in list.
I have a dicom file (one level of a pyramid) that worked prior to 0.18, but doesn't work with the recent changes. Specifically, it fails here: https://github.com/imi-bigpicture/wsidicom/blob/main/wsidicom/metadata/schema/dicom/image.py#L217 because "extended_depth_of_field_bool"
doesn't exist.
This was introduced by this change:
https://github.com/imi-bigpicture/wsidicom/pull/142/files#diff-c72fba25df969c0984b417d76eefc963a1a20014e847cf68397e2f9a85ba3e83R217-R219
I think it should be guarded, so it would change to:
extended_depth_of_field_bool = data.pop("extended_depth_of_field_bool", None)
extended_depth_of_field = data.get("extended_depth_of_field", None)
if (extended_depth_of_field_bool is not None) != (extended_depth_of_field is not None):
Hi,
First of all, thank you very much for providing this great library.
I noticed a difference in the handling of the same DICOM file provided by WsiDicomFileSource and WsiDicomWebSource through a DICOMWeb interface.
The WsiDicomFileSource filters the DICOM levels for matching tile_size and UIDs:
This filtering is not done in WsiDicomWebSource:
This discrepancy can lead to DICOMs that can be opened via file but not via the web interface. I wanted to ask if this was intended.
Regarding the same topic of opening images via file or DCMWeb:
wsidicom/wsidicom/web/wsidicom_web_source.py
Line 102 in b0d097c
If the DICOMWeb server does not support JPEG2000, for example, as a transfer syntax and throws an exception (requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: Google DICOM Archive), the code only catches 406 exceptions.
wsidicom/wsidicom/web/wsidicom_web_client.py
Line 218 in b0d097c
With kind regards,
Christian Marzahl
Hello!
I wanted to use your library to read some WSI Dicom files. In this case, I took the image "melanoma_pilot_003" that can be seen in file "region_wsidicom_level7".
I wanted to use the read_window
function to extract a particular window from the image:
from wsidicom import WsiDicom
slide = WsiDicom.open("./melanoma_pilot_003")
region = slide.read_region((0, 0),4, (3968, 4672))
region.save("./region_wsidicom_level4.png", format="png")
When I increase the zoom level the image area that is supposed to be blank is not. As I increase the zoom level, the image is repeated more and more. This can be seen in images "region_wsidicom_level5" and "region_wsidicom_level4", where the bottom of the images that is suppose to be blank isn't. This can be compared with the image obtained at zoom level 7 where there is nothing in the blank areas.
Have you already observed this phenomenon?
Have a nice day!
Binary operations should return NotImplemented and not raise NotImplementedError. https://docs.python.org/3/library/constants.html
Similar to how the WsiDicomFileSource
supports being provided multiple files, does it make sense to also allow multiple study_uids/ series_uids for the WsiDicomWebSource
?
For this example, for instance, there are many optical paths, and most of the optical paths are located in separate series. It would be nice if we could instantiate a WsiDicomWebSource
that contains all of these optical paths.
I would be happy to put up a PR for this if you think it sounds like a good idea.
Hello. I need to read those dicom files from the gcp cloud storage bucket. Is there any way I can pass the gs link to the WsiDicom.open() method?
Annotations created with the graphical annotations module lack study, series, and instance Uids.
In interface.py we have:
@property
def mpp(self) -> SizeMm:
"""Return pixel spacing in mm/pixel"""
return self._mpp
Do we refer to mpp as microns or mm?
Sometimes it is useful to run save() with a path to null as output path. This fails as a folder cant be created in null.
At the moment there is no convenient to get the tile_size, width, height and number of levels. What is the best way of exposing this making the trade between following the standard and making it easy to use?
Is there any hinder of just using:
width, height = slide.size.width, slide.size.height
tile_size = slide.tile_size
n_levels = len(slide.levels)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.