Coder Social home page Coder Social logo

wsidicom's Introduction

wsidicom

wsidicom is a Python package for reading DICOM WSI. The aims with the project are:

  • Easy to use interface for reading and writing WSI DICOM images and annotations either from file or through DICOMWeb.
  • Support the latest and upcoming DICOM standards.
  • Platform independent installation via PyPI.

Installing wsidicom

wsidicom is available on PyPI:

pip install wsidicom

And through conda:

conda install -c conda-forge wsidicom

Important note

Please note that this is an early release and the API is not frozen yet. Function names and functionality is prone to change.

Requirements

wsidicom uses pydicom, numpy, Pillow, marshmallow, fsspec, universal-pathlib, and dicomweb-client. Imagecodecs, pylibjpeg-rle, pyjpegls, and pylibjpeg-openjpeg can be installed as optionals to support additional transfer syntaxes.

Limitations

  • Levels are required to have (close to) 2 factor scale and same tile size.

  • Only 8 bits per sample is supported for color images, and 8 and 16 bits for grayscale images.

  • Without optional dependencies, the following transfer syntaxes are supported:

    • JPEGBaseline8Bit
    • JPEG2000
    • JPEG2000Lossless
    • HTJPEG2000
    • HTJPEG2000Lossless
    • HTJPEG2000RPCLLossless
    • ImplicitVRLittleEndian
    • ExplicitVRLittleEndian
    • ExplicitVRBigEndian
  • With imagecodecs, the following transfer syntaxes are additionally supported:

    • JPEGExtended12Bit
    • JPEGLosslessP14
    • JPEGLosslessSV1
    • JPEGLSLossless
    • JPEGLSNearLossless
    • RLELossless
  • With pylibjpeg-rle RLELossless is additionally supported.

  • With pyjpegls JPEGLSLossless and JPEGLSNearLossless is additionally supported.

  • Optical path identifiers needs to be unique across instances.

  • Only one pyramid (i.e. offset from slide corner) per frame of reference is supported.

Basic usage

Load a WSI dataset from files in folder.

from wsidicom import WsiDicom
slide = WsiDicom.open("path_to_folder")

The files argument accepts either a path to a folder with DICOM WSI-files or a sequence of paths to DICOM WSI-files.

Load a WSI dataset from remote url using fsspec.

from wsidicom import WsiDicom
slide = WsiDicom.open("s3://bucket/key", file_options={"s3": "anon": True})

Or load a WSI dataset from opened streams.

from wsidicom import WsiDicom

slide = WsiDicom.open_streams([file_stream_1, file_stream_2, ... ])

Or load a WSI dataset from a DICOMDIR.

from wsidicom import WsiDicom

slide = WsiDicom.open_dicomdir("path_to_dicom_dir")

Or load a WSI dataset from DICOMWeb.

from wsidicom import WsiDicom, WsiDicomWebClient
from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth('username', 'password')
client = WsiDicomWebClient.create_client(
    'dicom_web_hostname',
    '/qido',
    '/wado,
    auth
)
slide = WsiDicom.open_web(
    client,
    "study uid to open",
    "series uid to open" or ["series uid 1 to open", "series uid 2 to open"]
)

Alternatively, if you have already created an instance of dicomweb_client.DICOMwebClient, that may be used to create the WsiDicomWebClient like so:

dicomweb_client = DICOMwebClient("url")
client = WsiDicomWebClient(dicomweb_client)

Then proceed to call WsiDicom.open_web() with this as in the first example.

Use as a context manager.

from wsidicom import WsiDicom
with WsiDicom.open("path_to_folder") as slide:
    ...

Read a 200x200 px region starting from px 1000, 1000 at level 6.

region = slide.read_region((1000, 1000), 6, (200, 200))

Read a 2000x2000 px region starting from px 1000, 1000 at level 4 using 4 threads.

region = slide.read_region((1000, 1000), 6, (200, 200), threads=4)

Read 3x3 mm region starting at 0, 0 mm at level 6.

region_mm = slide.read_region_mm((0, 0), 6, (3, 3))

Read 3x3 mm region starting at 0, 0 mm with pixel spacing 0.01 mm/px.

region_mpp = slide.read_region_mpp((0, 0), 0.01, (3, 3))

Read a thumbnail of the whole slide with maximum dimensions 200x200 px.

thumbnail = slide.read_thumbnail((200, 200))

Read an overview image (if available).

overview = slide.read_overview()

Read a label image (if available).

label = slide.read_label()

Read (decoded) tile from position 1, 1 in level 6.

tile = slide.read_tile(6, (1, 1))

Read (encoded) tile from position 1, 1 in level 6.

tile_bytes = slide.read_encoded_tile(6, (1, 1))

Close files

slide.close()

API differences between WsiDicom and OpenSlide

The WsiDicom API is similar to OpenSlide, but with some important differences:

  • In WsiDicom, the open-method (i.e. WsiDicom.open()) is used to open a folder with DICOM WSI files, while in OpenSlide a file is opened with the __init__-method (e.g. OpenSlide()).

  • In WsiDicom the location parameter in read_region is relative to the specified level, while in OpenSlide it is relative to the base level.

  • In WsiDicom the level parameter in read_region is the pyramid index, i.e. level 2 always the level with quarter the size of the base level. In OpenSlide it is the index in the list of available levels, and if pyramid levels are missing these will not correspond to pyramid indices.

Conversion between OpenSlide location and level parameters to WsiDicom can be performed:

with WsiDicom.open("path_to_folder") as wsi:
    level = wsi.levels[openslide_level_index]
    x = openslide_x // 2**(level.level)
    y = openslide_y // 2**(level.level)

Metadata

WsiDicom parses the DICOM metadata in the opened image into easy-to-use dataclasses, see wsidicom\metadata.

with WsiDicom.open("path_to_folder") as wsi:
    metadata = wsi.metadata

The obtained WsiMetadata has child dataclass properties the resembelse the DICOM WSI modules (compare with the VL Whole Slide Microscopy Image CIOD):

  • study: The study the slide is part of (study identifiers, study date and time, etc.).
  • series: The series the slide is part of.
  • patient: Patient information (name, identifier, etc.).
  • equipment: Scanner information information.
  • optical_paths: List of optical path descriptions used for imaging the slide.
  • slide: Slide information, including slide identifier, stainings done on the slide, and samples placed on the slide, see details in Slide information
  • label: Slide label information, such as label text.
  • image: Image information, including acquisition datetime, pixel spacing, focus method, etc.
  • frame_of_reference_uid: The unique identifier for the frame of reference for the image.
  • dimension_organization_uids: List of dimension organization uids.

Note that not all DICOM attributes are represented in the defined metadata model. Instead the full ´pydicom´ Datasets can be accessed per level, for example:

with WsiDicom.open("path_to_folder") as wsi:
    wsi.levels.base_level.datasets[0]

If you encounter that some important and/or useful attribute is missing from the model, please make an issue (see Contributing).

Slide information

The Slide information model models the Specimen module has the following properties:

  • identifier: Identifier for the slide.
  • stainings: List of stainings done on the slide. Note that the model assumes that the same stainings have been done on all the samples on the slide.
  • samples: List of samples placed on the slide.

Note that that while the parsing of slide information is designed to be as flexible and permissive as possible, some datasets contains non-standard compliant Specimen modules that are (at least currently) not possible to parse. In such cases the stainings and samples property will be set to None. If you have a dataset with a Specimen module that you think should be parsable, please make an issue (see Contributing).

SlideSample

Each sample is model with the SlideSample dataclass, which represents an item in the DICOM Specimen Description Sequence

  • identifier: Identifier of the sample.
  • anatomical_sites: List of codes describing the primary anatomic structures of interest in the sample.
  • sampled_from: The sampling (of another specimen) that was done to produce the sample (if known). If the sampled specimen also was produced through sampling, this property will give access to the full hierarchy of (known) specimens.
  • uid: Unique identifier for the sample.
  • localization: Description of the placement of the sample on the slide. Should be present if more than one sample is placed on the slide.
  • steps: List of preparation steps performed on the sample.
  • short_description: Short description of the sample (should not exceed 64 characters).
  • detailed_description: Unlimited description of the sample.

Samplings

The optional sampled_from property can either be a Sampling or a UnknownSampling. Both of these specify a sampled specimen, with the difference that the UnknownSampling is used when the sampling conditions are not fully know. A Sampling is more detailed, and specifies the sampling method and optional properties such as sampling date_time, description and location.

Specimens

The specimen property of a Sampling or a UnknownSampling links to either a Specimen or a Sample. A Specimen has no known parents (e.g. could be the specimen extracted from a patient), while a Sample always is produced from one or more samplings of other Specimens or Samples. The samplings used to produce a Sample is given by its sampled_from-property. Both Specimen and Sample contain additional properties describing the specimen:

  • identifier: Identifier of the specimen.
  • type: Optional anatomic pathology specimen type code (e.g. "tissue specimen"). Should be a specimen type defined in CID 8103.
  • steps: List of processing steps performed on the specimen.
  • container: Optional container type code the specimen is placed in. Should be a container type defined in CID 8101.

Processing and staining steps

The processing steps that can be performed on a sample are:

  • Sampling: Sampling of the specimen in order to produce new specimen(s). The sampling method should be a method defined in CID 8110.
  • Collection: Collection of a specimen from a body. This can only be done on a Specimen, i.e. not on a specimen produced by sampling. The collection method should be a method defined in CID 8109.
  • Processing: Processing performed on the specimen. The processing method should be a method defined in CID 8113.
  • Embedding: Embedding done on the specimen. The embedding medium should be a medium defined in CID 8115.
  • Fixation: Fixation of the specimen. The fixative should be a fixative defined in CID 8114.
  • Receiving: Receiving of the specimen.
  • Storage: Storage of the specimen.

The Staining(s) for a Slide contains a list of substances used for staining. The substances used should defined in CID 8112.

Every processing step (including staining) also have the optional properties date_time for when the processing was done and description for a textual description of the processing.

These steps are parsed from the SpecimenPreparationSequence following TID 8004 for each specimen identifier in the item sequence.

Exporting to json

The metadata can be exported to json:

from wsidicom.metadata.schema.json import WsiMetadataJsonSchema

with WsiDicom.open("path_to_folder") as wsi:
    metadata = wsi.metadata

schema = WsiMetadataJsonSchema()
metadata_json = schema.dump(metadata)

Settings

The strictness of parsing of DICOM WSI metadata can be configured using the following settings (see Settings):

  • strict_specimen_identifier_check: Controls how to handle matching between specimen identifiers if one of the identifiers have a issuer of identifier set and the other does not. If True the identifiers are considered equal (provided that the identifier value is the same), if False the issuer of identifier must always also match. This setting is useful if for example a issuer of identifier is specified in the Specimen Description Sequence but steps in the Specimen Preparation Sequence lacks the issuer of identifier. The default value is True.
  • ignore_specimen_preparation_step_on_validation_error: Controls how to handle if a step in the Specimen Preparation Sequence fails to validate. If True, only steps that fails will be ignored. If False all steps will be ignored. The default value is True.

Saving files

An opened WsiDicom instance can be saved to a new path using the save()-method. The produced files will be:

  • Fully tiled. Any sparse tiles will be replaced with a blank tile with color depending on the photometric interpretation.
  • Have a basic offset table (or optionally an extended offset table or no offset table).
  • Not be concatenated.

By default frames are copied as-is, i.e. without re-compression.

with WsiDicom.open("path_to_folder") as slide:
    slide.save("path_to_output")

The output folder must already exists. Be careful to specify a unique folder folder to avoid mixing files from different images.

Optionally frames can be transcoded, either by a encoder setting or an encoder:

from wsidicom.codec import JpegSettings

with WsiDicom.open("path_to_folder") as slide:
    slide.save("path_to_output", transcoding=JpegSettings())

Settings

wsidicom can be configured with the settings variable. For example, set the parsing of files to strict:

from wsidicom import settings
settings.strict_uid_check = True
settings.strict_attribute_check = True

Annotation usage

Annotations are structured in a hierarchy:

  • AnnotationInstance Represents a collection of AnnotationGroups. All the groups have the same frame of reference, i.e. annotations are from the same wsi stack.
  • AnnotationGroup Represents a group of annotations. All annotations in the group are of the same type (e.g. PointAnnotation), have the same label, description and category and type. The category and type are codes that are used to define the annotated feature. A good resource for working with codes is available here.
  • Annotation Represents a annotation. An Annotation has a geometry (currently Point, Polyline, Polygon) and an optional list of Measurements.
  • Measurement Represents a measurement for an Annotation. A Measurement consists of a type-code (e.g. "Area"), a value and a unit-code ("mm")

Codes that are defined in the 222-draft can be created using the create(source, type) function of the ConceptCode-class.

Load a WSI dataset from files in folder.

from wsidicom import WsiDicom
slide = WsiDicom.open("path_to_folder")

Create a point annotation at x=10.0, y=20.0 mm.

from wsidicom import Annotation, Point
point_annotation = Annotation(Point(10.0, 20.0))

Create a point annotation with a measurement.

from wsidicom import ConceptCode, Measurement
# A measurement is defined by a type code ('Area'), a value (25.0) and a unit code ('Pixels).
area = ConceptCode.measurement('Area')
pixels = ConceptCode.unit('Pixels')
measurement = Measurement(area, 25.0, pixels)
point_annotation_with_measurment = Annotation(Point(10.0, 20.0), [measurement])

Create a group of the annotations.

from wsidicom import PointAnnotationGroup
# The 222 supplement requires groups to have a label, a category and a type
group = PointAnnotationGroup(
    annotations=[point_annotation, point_annotation_with_measurment],
    label='group label',
    categorycode=ConceptCode.category('Tissue'),
    typecode=ConceptCode.type('Nucleus'),
    description='description'
)

Create a collection of annotation groups.

from wsidicom import AnnotationInstance
annotations = AnnotationInstance([group], 'volume', slide.uids)

Save the collection to file.

annotations.save('path_to_dicom_dir/annotation.dcm')

Reopen the slide and access the annotation instance.

slide = WsiDicom.open("path_to_folder")
annotations = slide.annotations

Setup environment for development

Requires poetry installed in the virtual environment.

git clone https://github.com/imi-bigpicture/wsidicom.git
poetry install

To watch unit tests use:

poetry run pytest-watch -- -m unittest

The integration tests uses test images from nema.org that's needs to be downloaded. The location of the test images can be changed from the default tests\testdata\slides using the environment variable WSIDICOM_TESTDIR. Download the images using the supplied script:

python .\tests\download_test_images.py

If the files are already downloaded the script will validate the checksums.

To run integration tests:

poetry run pytest -m integration

Data structure

A WSI DICOM pyramid is in wsidicom represented by a hierarchy of objects of different classes, starting from bottom:

  • WsiDicomReader, represents a WSI DICOM file reader, used for accessing WsiDicomFileImageData and WsiDataset.
  • WsiDicomFileImageData, represents the image data in one or several (in case of concatenation) WSI DICOM files.
  • WsiDataset, represents the image metadata in one or several (in case of concatenation) WSI DICOM files.
  • WsiInstance, represents image data and image metadata.
  • Level, represents a group of instances with the same image size, i.e. of the same level.
  • Pyramid, represents a group of levels, i.e. the pyrimidal structure.
  • Pyramids, represents a collection of pyramids, each with different image coordate system or extended depth of field.
  • WsiDicom, represents a collection of pyramids, labels and overviews.

Labels and overviews are structured similarly to levels, but with somewhat different properties and restrictions. For DICOMWeb the WsiDicomFile* classes are replaced with WsiDicomWeb* classes.

A Source is used to create WsiInstances, either from files (WsiDicomFileSource) or DICOMWeb (WsiDicomWebSource), and can be used to to Initiate a WsiDicom object. A source is easiest created with the open() and open_web() helper functions, e.g.:

slide = WsiDicom.open("path_to_folder")

Code structure

  • codec - Encoders and decoders for image pixel data.
  • file - Implementation for reading and writing DICOM WSI files.
  • group - Group implementations, e.g. Level.
  • instance - Instance implementations WsiIsntance and WsiDataset, the metaclass ImageData and ImageData implementations WsiDicomImageData and PillowImageData.
  • metadata - Metadata models and schema for serializing and deserializing to DICOM and json.
  • series - Series implementations Levels, Labels, and Overview.
  • web - Implementation for reading DICOM WSI from DICOMWeb.
  • conceptcode.py - Handling of DICOM concept codes.
  • config.py - Handles configuration settings.
  • errors.py - Custom errors.
  • geometry.py - Classes for geometry handling.
  • graphical_annotations - Handling graphical annotations.
  • source.py - Metaclass Source for serving WsiInstances to WsiDicom.
  • stringprinting.py - For nicer string printing of objects.
  • tags.py - Definition of commonly used DICOM tags.
  • threads.py - Implementation of ThreadPoolExecutor that does not use a pool when only single worker.
  • uid.py - Handles DICOM uids.
  • wsidicom.py - Main class with methods to open DICOM WSI objects.

Adding support for other file formats

Support for other formats (or methods to access DICOM data) can be implemented by creating a new Source implementation, that should create WsiInstances for the implemented formats. A format specific implementations of the ImageData is likely needed to access the WSI image data. Additionally a WsiDataset needs to be created that returns matching metadata for the WSI.

The implemented Source can then create a instance from the implemented ImageData (and a method returning a WsiDataset):

image_data = MyImageData('path_to_image_file')
dataset = create_dataset_from_image_data(image_data)
instance = WsiInstance(dataset, image_data)

The source should arrange the created instances and return them at the level_instances, label_instances, and overview_instances properties. WsiDicom can then open the source object and arrange the instances into levels etc as described in 'Data structure'.

Other DICOM python tools

Contributing

We welcome any contributions to help improve this tool for the WSI DICOM community!

We recommend first creating an issue before creating potential contributions to check that the contribution is in line with the goals of the project. To submit your contribution, please issue a pull request on the imi-bigpicture/wsidicom repository with your changes for review.

Our aim is to provide constructive and positive code reviews for all submissions. The project relies on gradual typing and roughly follows PEP8. However, we are not dogmatic. Most important is that the code is easy to read and understand.

Acknowledgement

wsidicom: Copyright 2021 Sectra AB, licensed under Apache 2.0.

This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: <www.imi.europa.eu>

wsidicom's People

Contributors

erikogabrielsson avatar harmvz avatar psavery avatar sarthakpati avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wsidicom's Issues

Adding support for more transfer syntaxes

We are currently performing some tests with a DICOMweb server that has some examples in Explicit VR Little Endian and JPEG-LS format. It would be nice if we could support as many formats as possible in wsidicom.

If we have a transfer syntax that Pillow doesn't support, what do you think about relying on pydicom's pixel_data_handlers to first convert the bytes to a numpy array, and then convert it to Pillow via Image.fromarray()?

Those pixel_data_handlers currently do not support decoding individual frames. However, there is a PR up for such support, so it will hopefully be supported in the future. In the meantime, highdicom includes a decode_frame() function that takes as arguments some of the attributes on the DICOM dataset. It then creates a "fake" DICOM dataset and utilizes pydicom's pixel_data_handlers to convert the frame to a numpy array. highdicom is planning to deprecate and remove this function when pydicom starts supporting frame decoding. We could potentially add highdicom as an optional dependency for this functionality until pydicom supports it.

What do you think?

read_region_,mm() should have option for frame of reference space

read_region_mm() currently maps physical distance to pixels with same origin and orientation as the image pixels. Graphical annotations are often done with the frame of reference as origin, and to get the corresponding region it would be helpful if read_region_mm could (optionally) map to the frame of reference (and orientation).

Handle files missing extended_depth_of_field_bool

I have a dicom file (one level of a pyramid) that worked prior to 0.18, but doesn't work with the recent changes. Specifically, it fails here: https://github.com/imi-bigpicture/wsidicom/blob/main/wsidicom/metadata/schema/dicom/image.py#L217 because "extended_depth_of_field_bool" doesn't exist.

This was introduced by this change:
https://github.com/imi-bigpicture/wsidicom/pull/142/files#diff-c72fba25df969c0984b417d76eefc963a1a20014e847cf68397e2f9a85ba3e83R217-R219

I think it should be guarded, so it would change to:

        extended_depth_of_field_bool = data.pop("extended_depth_of_field_bool", None)
        extended_depth_of_field = data.get("extended_depth_of_field", None)
        if (extended_depth_of_field_bool is not None) != (extended_depth_of_field is not None):

Annotation support

There are a couple of options for supporting writing and reading annotations in DICOM that we could use. The main options are:

  • Structured reports, maybe using templates such as TID 1500
  • The new Microscopy Bulk Simple Annotations Storage SOP Class specified in supplement 222 (not finished)

Which option that will work best will likely depend on what we need to annotate.

Using what is supported in (the current) 222-annotations, we could make annotations like this:

# Load a slide
slide = WsiDicom.open('path_to_dicom_dir')

# Create a point annotation at x=10.0, y=20.0 mm
# Geometries are limited to: Point, open and closed Line, Ellipse, Rectangle
point_annotation = Annotation(Point(10.0, 20.0))

# Create a point annotation with a measurement
measurement = Measurement('Area', 25.0, 'Pixels')
point_annotation_with_measurment = Annotation(Point(10.0, 20.0), [measurement])

# Create a group of the annotations
# The annotations in a group are required to be of same geometry type.
annotations = [point_annotation, point_annotation_with_measurment]
category_code = CategoryCode('Tissue')
type_code = TypeCode('Nucleus')
group = PointAnnotationGroup(annotations, 'group label', category_code, type_code, 'description')

# Create a collection of annotation groups
collection = AnnotationCollection([group], slide.frame_of_reference)

# Save the collection to file
collection.save('path_to_dicom_dir/annotation.dcm')

# Load the slide again
slide = WsiDicom.open('path_to_dicom_dir')

# Access the annotations
group = slide.annotations[0]

There are some additional attributes that needs/could be set, such as algorithm identification etc.

If using 222-annotations is to restrictive, using structured reports is an option.

One mayor hurdle is how to store annotations with openings/holes, as there is no standard way of doing that in DICOM at the moment.

read_region() : blank spaces not blank as zoom level increases

Hello!

I wanted to use your library to read some WSI Dicom files. In this case, I took the image "melanoma_pilot_003" that can be seen in file "region_wsidicom_level7".

I wanted to use the read_window function to extract a particular window from the image:

from wsidicom import WsiDicom

slide = WsiDicom.open("./melanoma_pilot_003")
region = slide.read_region((0, 0),4, (3968, 4672))
region.save("./region_wsidicom_level4.png", format="png")

When I increase the zoom level the image area that is supposed to be blank is not. As I increase the zoom level, the image is repeated more and more. This can be seen in images "region_wsidicom_level5" and "region_wsidicom_level4", where the bottom of the images that is suppose to be blank isn't. This can be compared with the image obtained at zoom level 7 where there is nothing in the blank areas.

Have you already observed this phenomenon?

Have a nice day!

region_wsidicom_level4
region_wsidicom_level5
region_wsidicom_level7

Validation of BigPicture dicom files

Add support and/or document how to best validate that files follow the official BigPicture guidelines, supporting the scenario where you want to harmonize your dicom data:

if not WsiDicom.ready_for_viewing(folder_path):
    try:
        WsiDicom.open(folder_path).save(new_folder_path)
    except:
       raise Error("Dicom files was not ready for viewing, and it was not possible to convert it")

...

image = WsiDicom.open(folder_path, strict_mode=True)

Add recipie for conda

  • It would be great if this was available on conda in addition to pip for the wider community.

Sup 222 approved

As sup 222 is approved the implementation should be updated to include correct uid and tags.

Slow Initialization

I've got a couple of large DICOM WSI images where calling WsiDicom.open is VERY slow. I've tracked it down to read_dataset in the init which then calls filereader.py:41(data_element_iterator) then filereader.py:461(read_sequence). This file has nearly 180,000 frames, which could be why this is so slow. However, there must surely be a faster way to initialize.

I am on wsidicom 0.4.0, pydicom 2.3.1 and python 3.11.

Segmentation support

Support for segmentation/heat maps is likely needed. Segmentation, using either binary (0 or 1) or fractional values are supported in the segmentation IOD.

How will segmentation map look like? Will most of the slide (the part with specimen) be segmented or only certain regions/tiles?

Should the segmentation be based on physical dimensions (mm in slide) or pixels? Using mm, it is more straightforward to use the same segmentation map at different pyramidal levels of the slide, but pixel dimensions can of coarse be re-calculated to other levels knowing the pixel spacing of the segmentation. Is pixel-perfect segmentation needed?

Do we need to use pyramidal levels also for the segmentation maps?

How will the support for segmentation look like in Cytomine? @geektortoise @rmaree @urubens

Failed to parse LUT

If I run the following example:

from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.16.756.5.41.446.20190905094928'
series_uid = '2.16.756.5.41.446.20190905094928.20190905102252'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

I get this error:

ERROR:root:Failed to parse LUT
Traceback (most recent call last):
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 106, in from_ds
    return cls(ds.PaletteColorLookupTableSequence)
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 62, in __init__
    self.table = self._parse_lut(self._lut_item)
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/optical.py", line 156, in _parse_lut
    parsed_tables[color] = self._parse_color(table)
ValueError: could not broadcast input array from shape (65536,) into shape (0,)

That example is located here.

Consider making a new output folder when saving

Currently the save()-method will save DICOM files directly into the given output folder. This can give unexpected problems if a user saves multiple WSIs to the same folder, as it will then be difficult to figure out which files belongs to which WIS.

It could therefore be better if the files were saved in a subfolder to the given output folder.

Save support

There is a need to be able to save the wsi object to file for various reasons:

  • When adding annotations to a slide, saving the annotation instance to file.
  • When converting from another format, saving the full wsi object to a new set of DICOM files.
  • Other needs?

For conversion/saving, it is likely best to be able to this on a tile-by-tile basis whenever possible. We could then use another library, that can open and provide tiles (without transcoding) from the wsi, and use this to build the DICOM compatible pyramid. As we don't need to do any decompression/compression, the conversion will be fast and lossless.

For formats that are not native tiled, we to also need to tile the image data. We might be able to also do this lossless, but it will be more complicated.

Converting image data and adding the associated technical metadata is likely the easier part of converting to DICOM. We also need to add required DICOM data such as:

Although we will likely support opening of DICOM WSI files of different types (e.g. sparse of full tiled, concatenated or single file) per level it is probably best to limit the write support to what we think is "best practice" (full tiled, single file per level?).

read_region() tile width and height appear to be reversed

For the following example, it looks to me that the tile width and height are reversed:

from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.16.756.5.41.446.20190905094928'
series_uid = '2.16.756.5.41.446.20190905094928.20190905102252'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

size = (1000, 900)
print(f'{size=}')
tile = slide.read_region((0, 0), 0, size)
print(f'{tile.width=} {tile.height=}')

It prints out the following:

size=(1000, 900)
tile.width=900 tile.height=1000

If I look into the API for read_region, it looks like the size parameter is specified via (width, height). However, the returned tile appears to have the width and height reversed.

Allow null path for save()

Sometimes it is useful to run save() with a path to null as output path. This fails as a folder cant be created in null.

Unable to open dicomized data with wsidicom>=0.18.0

We have some dicomized testdata (used the wsidicomizer in early 2023 to dicomize CMU-1.svs) that can't be opened anymore using wsidicom>=0.18.0, showing the following error:

AttributeError: 'Dataset' object has no attribute 'XOffsetInSlideCoordinateSystem'

It seems that the old dicomized testdata has some missing data. Are there some mandatory dicom tags that need to be added to the data?

Define interface for width, height, tile_size and number of levels

At the moment there is no convenient to get the tile_size, width, height and number of levels. What is the best way of exposing this making the trade between following the standard and making it easy to use?

Is there any hinder of just using:

width, height = slide.size.width, slide.size.height
tile_size = slide.tile_size
n_levels = len(slide.levels)

Add support for deciding DICOMweb transfer syntax

For our DICOMweb example that we have been using, each series only supports one transfer syntax: the syntax that the pixel data is encoded in. We can obtain this information by requesting for the field AvailableTransferSyntaxUID in a query to the DICOMweb server.

Perhaps in wsidicom.open_web(), we should allow None to be passed as the requested_transfer_syntax. In that case, we perform a query to the DICOMweb server to see which transfer syntaxes that series supports. If multiple transfer syntaxes are supported, we would prefer the syntaxes that we can more easily work with (such as syntaxes that Pillow can read).

What do you think?

Support combining multiple series for DICOMweb

Similar to how the WsiDicomFileSource supports being provided multiple files, does it make sense to also allow multiple study_uids/ series_uids for the WsiDicomWebSource?

For this example, for instance, there are many optical paths, and most of the optical paths are located in separate series. It would be nice if we could instantiate a WsiDicomWebSource that contains all of these optical paths.

I would be happy to put up a PR for this if you think it sounds like a good idea.

`file_meta` unavailable for instance from DICOMweb

For one of my examples, I am encountering the following error:

  File "/opt/wsidicom/wsidicom/wsidicom.py", line 153, in open_web
    source = WsiDicomWebSource(
  File "/opt/wsidicom/wsidicom/web/wsidicom_web_source.py", line 97, in __init__
    annotation_instance = AnnotationInstance.open_dataset(instance)
  File "/opt/wsidicom/wsidicom/graphical_annotations.py", line 1688, in open_dataset
    if dataset.file_meta.MediaStorageSOPClassUID != ANN_SOP_CLASS_UID:
  File "/opt/venv/lib/python3.9/site-packages/pydicom/dataset.py", line 907, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'file_meta'

Is the file_meta attribute unavailable for instances obtained via DICOMweb?

Here is a reproducer of the issue:

from dicomweb_client import DICOMwebClient
from pydicom.datadict import tag_for_keyword
from pydicom.tag import Tag
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.25.18199272949575141157802058345697568861'
series_uid_tag = Tag(tag_for_keyword('SeriesInstanceUID')).json_key

client = DICOMwebClient(url)
series = client.search_for_series(study_uid)
series_uids = [x[series_uid_tag]['Value'][0] for x in series]

wsi_client = WsiDicomWebClient(client)
slide = WsiDicom.open_web(
    wsi_client,
    study_uid,
    series_uids,
)

HTTP Errors for Imaging Data Commons

The NCI's Imaging Data Commons is a big repository (>38k studies) for cancer research.

Their studies are available via DICOMweb. See here for an example of viewing one of them.

I tried to access that same example, but I get a couple of HTTP errors. Try out the following code:

from wsidicom import WsiDicom, WsiDicomWebClient

# For this one: https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.227261840503961430496812955999336758586/series/1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0
url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

It produces the following exception:

requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID

I played around with that url with curl and found a couple of errors. Accessing that full URL via this command:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID'

Produces:

[{
  "error": {
    "code": 400,
    "message": "invalid QIDO-RS query: unknown/unsupported QIDO attribute: AvailableTransferSyntaxUID",
    "status": "INVALID_ARGUMENT"
  }
}
]

So it is raising an exception because we asked for the AvailableTransferSyntaxUID. However, if I remove that part of the url:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

It raises another error:

[{
  "error": {
    "code": 400,
    "message": "generic::invalid_argument: SOPClassUID is not a supported instance or series level attribute",
    "status": "INVALID_ARGUMENT"
  }
}
]

So it is also complaining that we are asking for a SOPClassUID of WSI_SOP_CLASS_UID in the search filters.

If I remove that part of the url also:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

It works fine.

It's kind of annoying that an error is being raised for AvailableTransferSyntaxUID. It would be nice if it just returned an empty field if it was not available.

However, it would be really nice if we could support interacting with this DICOMweb server.

Let me know what your thoughts are, @erikogabrielsson.

PIL.UnidentifiedImageError: cannot identify image file

Do you know why this error is occurring? Here is a reproducer:

from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '1.3.6.1.4.1.5962.99.1.2447135355.1068687300.1625944806011.3.0'
series_uid = '1.3.6.1.4.1.5962.99.1.2447135355.1068687300.1625944806011.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

slide.read_region((1920, 1920), 4, (1067, 1017))

I get this error:

Traceback (most recent call last):
  File "/home/patrick/test.py", line 11, in <module>
    slide.read_region((1920, 1920), 4, (1067, 1017))
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/wsidicom.py", line 400, in read_region
    image = wsi_level.get_region(scaled_region, z, path, threads)
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/group/group.py", line 318, in get_region
    image = instance.image_data.stitch_tiles(region, path, z, threads)
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 525, in stitch_tiles
    self._paste_tiles(
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 576, in _paste_tiles
    pool.map(thread_paste, tile_region.chunked_iterate_all(threads))
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/thread.py", line 41, in map
    return (item for item in list(map(fn, *iterables)))
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/instance/image_data.py", line 568, in thread_paste
    for tile_point, tile in zip(
  File "/home/patrick/virtualenvs/histomics/src/wsidicom/wsidicom/web/wsidicom_web_image_data.py", line 102, in get_decoded_tiles
    yield Image.open(io.BytesIO(frame))
  File "/home/patrick/virtualenvs/histomics/lib/python3.9/site-packages/PIL/Image.py", line 3298, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f4aadc35680>

Note that this example contains uncompressed data.

need to read from google cloud bucket

Hello. I need to read those dicom files from the gcp cloud storage bucket. Is there any way I can pass the gs link to the WsiDicom.open() method?

Basic slide properties

Basic slide properties, such as size (width and height) of base layer and tile size, should be available.

Difference behaviour between WsiDicomFileSource and WsiDicomWebSource

Hi,

First of all, thank you very much for providing this great library.

I noticed a difference in the handling of the same DICOM file provided by WsiDicomFileSource and WsiDicomWebSource through a DICOMWeb interface.

The WsiDicomFileSource filters the DICOM levels for matching tile_size and UIDs:

filtered_files = cls._filter_files(files, series_uids, series_tile_size)

This filtering is not done in WsiDicomWebSource:

class WsiDicomWebSource(Source):

This discrepancy can lead to DICOMs that can be opened via file but not via the web interface. I wanted to ask if this was intended.

Regarding the same topic of opening images via file or DCMWeb:

transfer_syntax = self._determine_transfer_syntax(

If the DICOMWeb server does not support JPEG2000, for example, as a transfer syntax and throws an exception (requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: Google DICOM Archive), the code only catches 406 exceptions.

except HTTPError as exception:

With kind regards,
Christian Marzahl

Parsing fails on spare files with only one frame

For sparse files a PlanePositionSlideSequence in PerFrameFunctionalGroupsSequenceis expected to give the positions of the frames, but for some files with only one frame this instead in the SharedFunctionalGroupsSequence.

Automatic selection of offset table type

The DICOM file writer can either write basic or extended (or no) offset table. Basic offset table (BOT) can only index image files up to some 4 GB (due to the 32 bit unsigned integers used for indexing). Currently using BOT will fail after writing the file if the size is exceeded. An EOT should be used for larger files.

BOT and EOT are both places before the image data, but can only be written after all the image data has been written. Currently space is thus reserved for either BOT or EOT before writing the image data (and then filled in after writing the image data).

It would be nice if the writer would automatically select the best offset table to use based on the image data to write. When writing from DICOM this should be relative easy as there is no re-encoding. When the writer is used in wsidicomizer to convert non-DICOM files, the size to write can be harder to estimate as for example jpeg headers are added or the image data is re-encoded.

An simpler alternative could be to, if the file becomes to large for BOT, copy the unfinished file but use an EOT. As the image data would already be correctly prepared, this could be a simple and quick operation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.