Coder Social home page Coder Social logo

tiffslide's Introduction

tiffslide: a drop-in replacement for openslide-python

PyPI Version Conda (channel only) tiffslide ci GitHub issues PyPI - Downloads PyPI - Python Version DOI

Welcome to tiffslide πŸ‘‹, a tifffile based drop-in replacement for openslide-python.

tiffslide's goal is to provide an easy way to migrate existing code from an openslide dependency to the excellently maintained tifffile module.

We strive to make your lives as easy as possible: If using tiffslide is unintuitive, slow, or if it's drop-in behavior differs from what you expect, it's a bug in tiffslide. Feel free to report any issues or feature requests in the issue tracker!

Development happens on github :octocat:

Notes

TiffSlide aims to be compatible with all formats that openslide supports and more, but not all are implemented yet. Aperio SVS is currently the most tested format. Contributions to expand to a larger variety of file formats that tifffile supports are very welcome ❀️
If there are any questions open an issue, and we'll do our best to help!

Compatibility

Here's a list with currently supported formats.

File Format can be opened full support references
Aperio SVS βœ… βœ…
Generic TIFF βœ… βœ…
Hamamatsu NDPI βœ… ⚠️ #35
Leica SCN βœ… βœ…
Ventana ⚠️ ⚠️ #37
Hamamatsu VMS 🚫 🚫
DICOM 🚫 🚫 #32
Mirax 🚫 🚫 #33
Zeiss ZVI 🚫 🚫

Documentation

Installation

tiffslide's stable releases can be installed via pip:

pip install tiffslide

Or via conda:

conda install -c conda-forge tiffslide

Usage

tiffslide's behavior aims to be identical to openslide-python where it makes sense. If you rely heavily on the internals of openslide, this is not the package you are looking for. In case we add more features, we will add documentation here.

as a drop-in replacement

# directly
from tiffslide import TiffSlide
slide = TiffSlide('path/to/my/file.svs')

# or via its drop-in behavior
import tiffslide as openslide
slide = openslide.OpenSlide('path/to/my/file.svs')

access files in the cloud

A nice side effect of using tiffslide is that your code will also work with filesystem_spec, which enables you to access your whole slide images from various supported filesystems:

import fsspec
from tiffslide import TiffSlide

# read from any io buffer
with fsspec.open("s3://my-bucket/file.svs") as f:
    slide = TiffSlide(f)
    thumb = slide.get_thumbnail((200, 200))

# read from fsspec urlpaths directly, using your AWS_PROFILE 'aws'
slide = TiffSlide("s3://my-bucket/file.svs", storage_options={'profile': 'aws'})
thumb = slide.get_thumbnail((200, 200))

# read via fsspec from google cloud and use fsspec's caching mechanism to cache locally
slide = TiffSlide("simplecache::gcs://my-bucket/file.svs", storage_options={'project': 'my-project'})
region = slide.read_region((300, 400), 0, (512, 512))

read numpy arrays instead of PIL images

Very often you'd actually want your region returned as a numpy array instead getting a PIL Image and then having to convert to numpy:

import numpy as np
from tiffslide import TiffSlide

slide = TiffSlide("myfile.svs")
arr = slide.read_region((100, 200), 0, (256, 256), as_array=True)
assert isinstance(arr, np.ndarray)

Development Installation

If you want to help improve tiffslide, you can setup your development environment in two different ways:

With conda:

  1. Clone tiffslide git clone https://github.com/bayer-science-for-a-better-life/tiffslide.git
  2. cd tiffslide
  3. conda env create -f environment.devenv.yml
  4. Activate the environment conda activate tiffslide

Without conda:

  1. Clone tiffslide git clone https://github.com/bayer-science-for-a-better-life/tiffslide.git
  2. cd tiffslide
  3. python -m venv venv && source venv/bin/activate && python -m pip install -U pip
  4. pip install -e .[dev]

Note that in these environments tiffslide is already installed in development mode, so go ahead and hack.

Benchmarks

Here are some benchmarks comparing tiffslide to openslide for different supported file types and access patterns. Please note that you should test the difference in access time always for yourself on your target machine and your specific use case.

In case you would like a specific use case to be added, please feel free to open an issue or make a pull request.

The plots below were generated on a Thinkpad E495 and the files were stored on the internal ssd. Note, that in general, on my test my machine, tiffslide outperforms openslide when reading data as numpy arrays. Ventana tile reading is not "correct" since as of now (1.5.0) tiffslide lacks compositing for the overlapping tiles.

See the docs/README.md to run the benchmarks on your own machine.

reading PIL images

access times reading PIL

reading Numpy arrays

access times reading numpy

Contributing Guidelines

  • Please follow pep-8 conventions but:
    • We allow 120 character long lines (try anyway to keep them short)
  • Please use numpy docstrings.
  • When contributing code, please try to use Pull Requests.
  • tests go hand in hand with modules on tests packages at the same level. We use pytest.

You can setup your IDE to help you adhering to these guidelines.
(Santi is happy to help you setting up pycharm in 5 minutes)

Acknowledgements

Build with love by Andreas Poehlmann and Santi Villalba from the Machine Learning Research group at Bayer.

tiffslide: copyright 2020 Bayer AG, licensed under BSD

tiffslide's People

Contributors

ap-- avatar erikogabrielsson avatar kaczmarj avatar one-sixth avatar sarthakpati avatar sdvillal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tiffslide's Issues

Two series in a scn tiff image, but only first series are used.

My scn tiff image has two series, the first layer is low resolution and the second layer is high resolution.
Only the first series is seen, the second series.

Maybe we can add a parameter like this " slide = TiffSlide('abc.scn', use_series='auto') ".
When in " use_series='auto' " the series with the largest resolution is automatically selected.

Below is the shape of each page.

(4668, 1616, 3)
(4668, 1616, 3)
(1167, 404, 3)
(291, 101, 3)
(67552, 43392, 3)
(16888, 10848, 3)
(4222, 2712, 3)
(1055, 678, 3)
(263, 170, 3)

This is my tifffile scn tiff scn_metadata output.

<?xml version="1.0"?>
<scn xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uuid="urn:uuid:d9fcefd6-4a8c-4807-86ba-085e664f1f76" xmlns="http://www.leica-microsystems.com/scn/2010/10/01">
  <collection name="ImageCollection_0000001862" uuid="urn:uuid:04538c70-e2e7-40d1-9ae8-fd9f20285542" sizeX="26564529" sizeY="76734666">
    <image name="image_0000004202" uuid="urn:uuid:a56cc2a7-61fe-448c-9591-442461e81e3c">
      <creationDate>2020-08-19T12:06:35.19Z</creationDate>
      <device model="Leica SCN400;Leica SCN" version="1.5.1.10804 2012/05/10 13:29:07;1.5.1.10864" />
      <pixels sizeX="1616" sizeY="4668">
        <dimension sizeX="1616" sizeY="4668" r="0" ifd="0" />
        <dimension sizeX="404" sizeY="1167" r="1" ifd="1" />
        <dimension sizeX="101" sizeY="291" r="2" ifd="2" />
      </pixels>
      <view sizeX="26564529" sizeY="76734666" offsetX="0" offsetY="0" spacingZ="0" />
      <scanSettings>
        <objectiveSettings>
          <objective>0.60833</objective>
        </objectiveSettings>
        <illuminationSettings>
          <numericalAperture>0.7</numericalAperture>
          <illuminationSource>brightfield</illuminationSource>
        </illuminationSettings>
      </scanSettings>
    </image>
    <image name="image_0000004206" uuid="urn:uuid:a6cdd838-572f-400b-8337-fab8740856aa">
      <creationDate>2020-08-19T12:13:04.14Z</creationDate>
      <device model="Leica SCN400;Leica SCN" version="1.5.1.10804 2012/05/10 13:29:07;1.5.1.10864" />
      <pixels sizeX="43392" sizeY="67552">
        <dimension sizeX="43392" sizeY="67552" r="0" ifd="3" />
        <dimension sizeX="10848" sizeY="16888" r="1" ifd="4" />
        <dimension sizeX="2712" sizeY="4222" r="2" ifd="5" />
        <dimension sizeX="678" sizeY="1055" r="3" ifd="6" />
        <dimension sizeX="170" sizeY="263" r="4" ifd="7" />
      </pixels>
      <view sizeX="21696000" sizeY="33776000" offsetX="1019179" offsetY="16624923" spacingZ="400" />
      <scanSettings>
        <objectiveSettings>
          <objective>20</objective>
        </objectiveSettings>
        <illuminationSettings>
          <numericalAperture>0.4</numericalAperture>
          <illuminationSource>brightfield</illuminationSource>
        </illuminationSettings>
      </scanSettings>
    </image>
  </collection>
</scn>

Allowing tile to extend beyond image region by padding.

Hello,

I have a use case where I use point annotated tumour regions to extract patches to train a segmentation deep learning system.

Each patch is a constant size, but depending on the coordinate location of the point annotation on the slide, this means that occasionally, a patch extends beyond the image itself by some margin if that point is close to the edge of an image.

In short, will there be an option to pad the tile?

When using openslide, any region outside the image is given a black RGB value by default.

For now, I am catching the error but the flexibility to do this with padding might be important for some. And perhaps even return a mask where this padding happens?

Thanks,
Wilson

Support for DICOM

Hey,

It would be great to add the ability to support WSI DICOM. There is a very nice library called wsidicom that takes care of it in a nice way.

Cheers,
Sarthak

Tiffslide much slower than openslide reading patches from SVS with JPEG2000 compression

hello, thanks for developing this fantastic package! i am working on porting one of my projects from openslide to tiffslide (very easy thanks to mirrored API πŸ˜„). however i found that tiffslide is much slower than openslide when reading patches from an SVS file in The Cancer Genome Atlas (TCGA).

i created a jupyter notebook to benchmark this here https://gist.github.com/kaczmarj/41c351be6f52aa6a553cc12ba98a9103. this notebook runs a simple benchmarking function on a TCGA BRCA slide and a TIFF and SVS file from openslide test data.

using the slide TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs (from https://portal.gdc.cancer.gov/files/d46167af-6c29-49c7-95cf-3a801181aca4), i got the following results. tiffslide takes >10x longer to read patches than openslide.

i did not see the same behavior when evaluating CMU-1.tiff and CMU-1.svs from openslide test data, so i don't suspect disk caching to be the culprit.

Openslide -- get thumbnail
711 ms Β± 18.5 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each)
Tiffslide -- get thumbnail
2.27 s Β± 38.2 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each)

Openslide -- read region at level 0
1.89 ms Β± 17.8 Β΅s per loop (mean Β± std. dev. of 7 runs, 1,000 loops each)
Tiffslide -- read region at level 0
77.5 ms Β± 2.1 ms per loop (mean Β± std. dev. of 7 runs, 10 loops each)

Openslide -- read region at level 2
6.93 ms Β± 250 Β΅s per loop (mean Β± std. dev. of 7 runs, 100 loops each)
Tiffslide -- read region at level 2
73.5 ms Β± 1.21 ms per loop (mean Β± std. dev. of 7 runs, 10 loops each)

add comparison tests

Need to add tests to compare openslide and tiffslide output and define how much of a drop-in replacement tiffslide will be.

Too big for AWS Lambda layer

Is there any way to trim down the dependencies? The total uncompressed size of a deployment package in AWS Lambda can't be more than 250MB. The total size of tiffslide plus dependencies is 195MB, which means I'm over the limit when I add other things (e.g. s3fs).

Breaks when running close() after accessing the zarr group

Minimal example (with one of our prediction images):

with TiffSlide(original_image_path) as original_image:
     _ = original_image.ts_zarr_grp

Results in:

Traceback (most recent call last):
  File "/Users/santi/Proyectos/--pathology/pathological-suite/palo/palo/recipes/mil/vk_aignostics_test_sets.py", line 95, in <module>
    color_areas(EXAMPLE)
  File "/Users/santi/Proyectos/--pathology/pathological-suite/palo/palo/recipes/mil/vk_aignostics_test_sets.py", line 52, in color_areas
    with TiffSlide(original_image_path) as original_image:
  File "/Users/santi/Proyectos/--pathology/pathological-suite/tiffslide/tiffslide/tiffslide.py", line 98, in __exit__
    self.close()
  File "/Users/santi/Proyectos/--pathology/pathological-suite/tiffslide/tiffslide/tiffslide.py", line 101, in close
    if self._zarr_grp:
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/zarr/hierarchy.py", line 238, in __len__
    return sum(1 for _ in self)
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/zarr/hierarchy.py", line 238, in <genexpr>
    return sum(1 for _ in self)
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/zarr/hierarchy.py", line 232, in __iter__
    if (contains_array(self._store, path) or
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/zarr/storage.py", line 96, in contains_array
    return key in store
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/_collections_abc.py", line 822, in __contains__
    self[key]
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/zarr/storage.py", line 545, in __getitem__
    return self._mutable_mapping[key]
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/tifffile/tifffile.py", line 9086, in __getitem__
    return self._getitem(key)
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/tifffile/tifffile.py", line 9562, in _getitem
    keyframe, page, chunkindex, offset, bytecount = self._parse_key(key)
  File "/Users/santi/Utils/mambaforge/envs/pasu/lib/python3.10/site-packages/tifffile/tifffile.py", line 9613, in _parse_key
    series = self._data[int(level)]
ValueError: invalid literal for int() with base 10: '.zattrs'

Tiffslide fails in multi-threaded mode

Hi and thanks for this promising library.

My group has started using it as an openslide replacement and we just encountered one serious issue. Extracting patches from large tiff images is the most efficient when using multi-threading. Unfortunately, tiffslide fails in this mode. Tested using two independent Linux environments and the most recent versions (as of Dec 13) of both tiffslide and tifffile, along with python 3.8 and 3.9. Attached is a minimalistic example using python's multi-threading support. Before running just edit the file and point it to some tiff file (tiff_file variable).

Details/notes:

  • the attached script will work fine when num_threads = 1
  • with num_threads = 2 it will work most of the time and crash occasionally (reporting image problems/corruption).
  • with a higher thread count (8 and up) it will crash constantly. This kind of behaviour is typical for programs which suffer from threading issues.
  • when switching to openslide (import openslide), the attached script works fine (and fast) even for over 30 threads.
  • it is not certain which exact library is to blame here (tifffile or tiffslide), so maybe you will be able to pinpoint that.

Thanks for looking into this. If you need more details I'll be glad to provide them.

Attachment:
tiffslide-bug.zip

Stacklevel on import warning

/home/user/miniconda3/envs/someenv/lib/python3.8/site-packages/tiffslide/__init__.py:41: UserWarning: compatibility: aliasing tiffslide.TiffSlide to 'OpenSlide'
  warn(f"compatibility: aliasing tiffslide.TiffSlide to {name!r}")

This should ideally warn in user code and not in tiffslide.

Compatibility issue with read_region

First of all, thanks for creating tiffslide!

When using it as a drop-in-replacement, we noticed that read_region returns slightly different regions for levels other than the base level. But this does not apply in general, for example it works for all levels and regions if the region starts with the location (0,0).

When we looked into the code we noticed two differences to openslide, which could be the reason:

Downsample factors

The downsample factors are calculated in different ways, compare
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide.c#L271

l->downsample =
        (((double) blh / (double) l->h) +
         ((double) blw / (double) l->w)) / 2.0;
    }

and
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L228

math.sqrt((w0 * h0) / (w * h))

This results in values that are slightly different, e.g. for CMU-1.svs it's
(1.0, 4.000121534371467, 16.00048613748587)(tiffslide) instead of
(1.0, 4.000121536217793, 16.00048614487117)(openslide).

But the difference is really small and probably has no major impact.

Coordinate conversion in read_region

In read_region openslide uses the mentioned downsample factors to get from the location on the base level to the wanted level, especially the same downsample factor is used for x and y direction. Tiffslide calculates the location independently for x and y and uses the ratio of the width or height at the specific level and the base level, compare

https://github.com/bayer-science-for-a-better-life/tiffslide/blob/v1.10.0/tiffslide/tiffslide.py#L354

rx0 = (base_x * level_w) // base_w
ry0 = (base_y * level_h) // base_h

and openslide
https://github.com/openslide/openslide/blob/v3.4.1/src/openslide-vendor-aperio.c#L259

x / l->base.downsample
y / l->base.downsample

This can result in a different region, e.g. getting region starting at base level location(100, 100) for level 1 results in different starting locations at level 1, again for example image CMU-1.svs (with level dimensions: (46000, 32914), (11500, 8228), (2875, 2057) and mentioned downsample factors):

  • tiffslide: (25, 24)
  • openslide: (24, 24)

Do you think adjustments in read_region makes sense? I can understand if you are cautious with the change, because tiffslide in itself then loses compatibility, on the other hand the consistency with the openslide would be desirable.

I don't think there is a right solution here unfortunately, because we mostly don't know how the downsampling has been done exactly, but I would assume that in many cases the same factor has been used for x and y direction and that the pixels should remain square and the image has rather been cut off or expanded somewhere.

Dependencies: restrict combinations of versions of dependencies

This is an interesting issue. We depend on tifffile, and specifically its zarr interface. Since we don't want to depend on tifffile[all] because we don't need matplotlib and other optional dependencies of tifffile, we opted for:

tiffslide/setup.cfg

Lines 36 to 42 in 8bea5a4

install_requires =
imagecodecs
fsspec!=2022.11.0,!=2023.1.0
pillow
tifffile>=2021.6.14
zarr>=2.11.0
typing_extensions>=4.0

Basically manually adding the dependencies we need.
But this can lead to installations in which pip would resolve a working environment, that might have two incompatible versions of imagecodecs and tifffile installed. Here are the compatible versions:

tifffile imagecodecs
>=2023.8.12 >=2023.8.12
>=2023.1.23 >=2023.1.23
>=2022.7.28 >=2022.2.22
>=2022.2.22 >=2021.11.20
>=2021.7.30 >=2021.7.30
>=2021.6.6 >=2021.4.28

I think two incompatible versions of imagecodecs and tifffile might be the cause of imi-bigpicture/wsidicomizer#87 and zarr might catch whatever error and just return black tiles. But I first need to investigate if my guess is actually correct...

Ventana BIF overlap not considered

ts_slide = TiffSlide('OS-2.bif'), os_slide = OpenSlide('OS-2.bif')

    def test_level_dimensions(ts_slide, os_slide):
>       assert ts_slide.level_dimensions == os_slide.level_dimensions
E       AssertionError: assert ((128000, 829...0, 2600), ...) == ((114943, 763...2, 2386), ...)
E         At index 0 diff: (128000, 82960) != (114943, 76349)
E         Full diff:
E           (
E         -  (114943, 76349),
E         -  (57472, 38175),
E         -  (28736, 19088),
E         -  (14368, 9544),...
E         
E         ...Full output truncated (20 lines hidden), use '-vv' to show


Fixing this requires two steps:

  • (1) Generating absolute offset coordinates for each tile from the regular grid in the Ventana file
  • (2) A zarrstore that translates irregular positioned chunks with overlap into a regular grid

Notes:

(1) we get a list of offsets between pairs of tiles with a confidence estimate from the metadata. These would have to be fed into a solver that returns optimal absolute coordinates for each tile.

(2) the same translating store could be used to implement Mirax support. #33

ValueError when loading region from Aperio .svs

Hello,

In my workflow, I am trying out Tiffslide as a drop-in replacement for openside.

I am having issues loading a region from a loaded Aperio SVS image. I am passing the same types as I would openslide. Any advice? Possible version incompatibility?

Some of the WSI I am working with are relatively old (12+ years).

Here is the trace for an example call to .read_region()

self.loaded_svs.read_region((9106, 1352), 0, (512, 1028))
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2021.1\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\tiffslide\tiffslide.py", line 389, in read_region
    if isinstance(self.ts_zarr_grp, zarr.core.Array):
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\functools.py", line 967, in __get__
    val = self.func(instance)
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\tiffslide\tiffslide.py", line 317, in ts_zarr_grp
    return zarr.open(store, mode="r")
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\zarr\convenience.py", line 100, in open
    if contains_array(_store, path):
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\zarr\storage.py", line 96, in contains_array
    return key in store
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\_collections_abc.py", line 666, in __contains__
    self[key]
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\zarr\storage.py", line 545, in __getitem__
    return self._mutable_mapping[key]
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\tifffile\tifffile.py", line 8485, in __getitem__
    return self._getitem(key)
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\tifffile\tifffile.py", line 8873, in _getitem
    keyframe, page, chunkindex, offset, bytecount = self._parse_key(key)
  File "C:\Users\benja\miniconda3\envs\SVS_Loader\lib\site-packages\tifffile\tifffile.py", line 8914, in _parse_key
    level, key = key.split('/')
ValueError: not enough values to unpack (expected 2, got 1)

Tiffslide errors when used in pytorch dataloader with `num_workers>1`

Unfortunately, tiffslide fails again in parallel mode, this time using pytorch dataloaders. This is a very common technique used in WSI processing with pytorch, the only difference is that it uses process based parallelisation (rather than threads, as in the original bug report).

The symptoms are exactly the same:

  • using tiffslide and one dataloader process (num_workers=1) everything works fine
  • using tiffslide and more dataloader processes (e.g. num_workers=4) the processing fails
  • using openslide everything works fine regardless of the num_workers value

Tested using tiffslide version 1.0.0 and tifffile version 2022.2.9. Please see the attached minimalist example.

tiffslide-bug2.zip

Originally posted by @lukasii in #14 (comment)

round downsamples ?!?

I believe the way openslide calculates downsamples is not correct. I assume that there is a reason why the downsample factor in openslide is calculated as seen below. But my guess is that this is incorrect and should actually be rounded to whole numbers.

downsamples:
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/ff2b18c0c067cd9a29f38fe6abe6f9dddf59cd5d/tiffslide/tiffslide.py#L231-L234

lower level offsets:
https://github.com/bayer-science-for-a-better-life/tiffslide/blob/ff2b18c0c067cd9a29f38fe6abe6f9dddf59cd5d/tiffslide/tiffslide.py#L434-L438

Other References:

TODO:

  • try to find examples of whole slide images that DO NOT have (almost) integer scaling.
  • try to figure out why openslide calculates downsampling as it does
  • implement a switch in tiffslide that rounds downsamples and switches to (more correct?) integer scaling between levels
  • create a test script using openslide for finding examples that would contradict my intuition and ask image.sc users to run this on as many images as possible.

Standalone documentation with sphinx

not so urgent, but when we diverge more (as soon as we take more advantage of tifffile and zarr) we would want our own docs.

TODO: should at least claim the name on rtd for now...

Offer direct-to-gpu decode

Everything is in place to provide nvjpeg decode directly to gpu memory. I just need to find some spare time to implement the required glue...

  • provide compatible API to chose gpu decoder
  • start with svs support for simplicity
  • loudly fail in untested cases
  • provide good benchmarks for publicly available files

via GIPHY

tiffslide with CLAM

Hello,

I have some Whole Slide Images and I want to use CLAM for a Multiple Instance Learning training.
In the scripts, I have replaced all 'import openslide' with 'import tiffslide as openslide'.
However I have a problem with the dataloader which I think arises from how tiffslide reads the image regions.
Specifically, in line 37 the num_workers are set to 4.
I noticed that if the num_workers are 0 or 1, it works without errors but it is slow.
When num_workers>1 then I get the below error for the dataloader.
The weird thing is that for the same image, the error doesn't always happen.

Any feedback appreciated.

File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/extract_features_fp.py", line 124, in <module>
    output_file_path = compute_w_loader(h5_file_path, output_path, wsi,
  File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/extract_features_fp.py", line 48, in compute_w_loader
    for count, (batch, coords) in enumerate(loader):
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
imagecodecs._jpeg8.Jpeg8Error: Caught Jpeg8Error in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/gpfs/workdir/papadomama/GR_scripts/explore_CLAM/create_patches/CLAM/datasets/dataset_h5.py", line 158, in __getitem__
    img = self.wsi.read_region(
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tiffslide/tiffslide.py", line 386, in read_region
    arr: npt.NDArray[np.int_] = get_zarr_selection(
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tiffslide/_zarr.py", line 193, in get_zarr_selection
    return grp[str(level)][selection]
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 807, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 933, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out,
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 976, in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 1267, in _get_selection
    self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/core.py", line 1966, in _chunk_getitem
    cdata = self.chunk_store[ckey]
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/zarr/storage.py", line 724, in __getitem__
    return self._mutable_mapping[key]
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 11308, in __getitem__
    return self._getitem(key)
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 11973, in _getitem
    chunk = keyframe.decode(
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/tifffile/tifffile.py", line 7736, in decode_jpeg
    data_array: numpy.ndarray = imagecodecs.jpeg_decode(
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/imagecodecs/imagecodecs.py", line 995, in jpeg_decode
    raise exc
  File "/gpfs/users/papadomama/.conda/envs/mariapap/lib/python3.10/site-packages/imagecodecs/imagecodecs.py", line 966, in jpeg_decode
    return imagecodecs.jpeg8_decode(
  File "imagecodecs/_jpeg8.pyx", line 332, in imagecodecs._jpeg8.jpeg8_decode
imagecodecs._jpeg8.Jpeg8Error: Unsupported marker type 0x81

Add conda recipe

Describe the new feature:

It would be great to have a conda recipe so that it can be included with projects that have more complicated build processes (for example, using libraries that need C/C++ compilers).

What is the current outcome?

Have a recipe for tiffslide on the conda-forge channel.

Is it backward-compatible?

Yes, and it would be forward-compatible as well, because the conda bots can automatically fetch new sdists pushed to pypi.

static type checking

The codebase is small, so this might be a very good issue to make oneself familiar with static typing and mypy

overflow encountered in long_scalars ry0 = (base_y * level_h) // base_h

Overflow issues may occur when the user enters a parameter of type np.int32. As a result, some images cannot be displayed.
This can be tricky to spot because there's only an obscure warning, we'd better cast the parameter to int explicitly.
Openslide does not have this problem.

The problem arises in

         rx0 = (base_x * level_w) // base_w
         ry0 = (base_y * level_h) // base_h
d:\github_repo\tiffslide\tiffslide\tiffslide.py:414: RuntimeWarning: overflow encountered in long_scalars
  ry0 = (base_y * level_h) // base_h

In my tiffslide library that fixes the padding problem.
Thumbnails are generated from the level_0, level 0 size (39840, 50329).

location, size is np.int32
o2

location, size is int
o3

Issue with `get_best_level_for_downsample` for NDPI

Hi,

I think I found a little issue with the get_best_level_for_downsample function, you can reproduce it with the following snippet:

import tiffslide, openslide
os_slide, ts_slide = openslide.OpenSlide('CMU-1.ndpi'), tiffslide.TiffSlide('CMU-1.ndpi')

os_slide.level_downsamples
# (1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0)
os_slide.get_best_level_for_downsample(4.0)
# 2

ts_slide.level_downsamples
# (1.0, 4.0, 16.0, 64.0)
ts_slide.get_best_level_for_downsample(4.0)
# 0 (expected 1)

My guess is that it is due to the return, I would expect this to be return lvl or alternatively the >= needs to just be > at the previous if statement:

https://github.com/bayer-science-for-a-better-life/tiffslide/blob/f23f7c9f0c3803cdb4e41378fb7f09bb0ee0047f/tiffslide/tiffslide.py#L247-L248

I hope this helps. Thanks for all the effort!

Support for Mirax

openslide supports mirax.

Adding this so that it's mentioned in the issue tracker.

Unable to read PNG files

Hi,

My workflow involves extracting patches from WSI and storing them in PNGs. But when I try to read PNG file, I am getting a tifffile error:

>>> from tiffslide.tiffslide import TiffSlide
>>> path=r"C:\Projects\GaNDLF\testing\histo_patches\histo_patches_output\1\image\image_patch_3792-13696.png"
>>> TiffSlide(path)
Traceback (most recent call last):
  File "C:\Projects\GaNDLF\venv\lib\site-packages\tifffile\tifffile.py", line 3142, in __init__
    byteorder = {b'II': '<', b'MM': '>', b'EP': '<'}[header[:2]]
KeyError: b'\x89P'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Projects\GaNDLF\venv\lib\site-packages\tiffslide\tiffslide.py", line 107, in __init__
    filename, storage_options=storage_options, tifffile_options=tifffile_options
  File "C:\Projects\GaNDLF\venv\lib\site-packages\tiffslide\tiffslide.py", line 527, in _prepare_tifffile
    return TiffFile(path, **tf_kw)
  File "C:\Projects\GaNDLF\venv\lib\site-packages\tifffile\tifffile.py", line 3144, in __init__
    raise TiffFileError(f'not a TIFF file {header!r}')
tifffile.tifffile.TiffFileError: not a TIFF file b'\x89PNG'

image_patch_3792-13696

Attaching a PNG for reference.

Thanks!

benchmark tiffslide

We need benchmarking against openslide, and benchmarks using remote an in memory filesystems

Add supported formats table to README

User comment:

I can't seem to find a list of what formats are supported?
is that somewhere and I'm missing it?
might be good to include in the readme.md

TODO:
implement #1 to be able to give some guarantees for supported formats

Openseadragon very slow with svs

tiffslide is working very,very good with openslide. But Openseadragon is slow. And very slow with svs. Do you got a clue why?

tiffslide not reading MPP of CAMELYON16 but openslide does

hello, i am using tiffslide with the camelyon16 dataset. i realized that tiffslide cannot read the mpp of these images, whereas openslide-python can. i have included info below on how to reproduce this behavior:

create python environment and download camelyon16 file:

# Create python env
python3.10 -m venv --upgrade-deps venv/
source ./venv/bin/activate
python -m pip install awscli openslide-python tiffslide

# Copy sample file from CAMELYON16
aws s3 cp --no-sign-request s3://camelyon-dataset/CAMELYON16/images/normal_001.tif .

read mpp with openslide and tiffslide:

>>> import openslide, tiffslide
>>> oslide = openslide.open_slide("normal_001.tif")
>>> oslide.properties[openslide.PROPERTY_NAME_MPP_X]
'0.24309399999999998'
>>> tslide = tiffslide.open_slide("normal_001.tif")
>>> tslide.properties[tiffslide.PROPERTY_NAME_MPP_X]
>>> tslide.properties[tiffslide.PROPERTY_NAME_MPP_X] is None
True

here are the openslide-python and tiffslide versions

>>> import openslide, tiffslide
>>> openslide.__version__
'1.3.0'
>>> tiffslide.__version__
'2.2.0'

Leica SCN multiple series not composited into one level

ts_slide = TiffSlide('Leica-2.scn'), os_slide = OpenSlide('Leica-2.scn')

    def test_level_dimensions(ts_slide, os_slide):
>       assert ts_slide.level_dimensions == os_slide.level_dimensions
E       AssertionError: assert ((39168, 2604...01), (38, 25)) == ((106259, 306...), (104, 298))
E         At index 0 diff: (39168, 26048) != (106259, 306939)
E         Full diff:
E           (
E         -  (106259,
E         -   306939),
E         -  (26565,
E         -   76735),...
E         
E         ...Full output truncated (29 lines hidden), use '-vv' to show

Support for axes=='YX' and uint16 svs files.

Hello. I have some svs files whose axes are 'YX' and dtype is uint16.
Openslide doesn't support opening them, but tiffslide can open them with a little modification.

I opened a PR.

Some scn test slide errors: Cannot handle data type

I found some smaller examples of scn files in openslide-testdata.
https://openslide.cs.cmu.edu/download/openslide-testdata/Leica/

I tested the Leica-1.scn file and it looks fine.

Then I tested Leica-Fluorescence-1.scn and found a new problem about PIL, which may need to open a new issue to discuss this problem. But for this file, openslide can't open it.

a=np.asarray(im2.get_thumbnail([768,768]))


Traceback (most recent call last):
  File "D:\Python\Python39\lib\site-packages\PIL\Image.py", line 2813, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 74), '|u1')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "D:\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-5d4126449fe6>", line 1, in <module>
    a=np.asarray(im2.get_thumbnail([768,768]))
  File "d:\github_repo\tiffslide\tiffslide\tiffslide.py", line 432, in get_thumbnail
    img = self.read_region((0, 0), level, _level_dimensions)
  File "d:\github_repo\tiffslide\tiffslide\tiffslide.py", line 392, in read_region
    return Image.fromarray(arr)
  File "D:\Python\Python39\lib\site-packages\PIL\Image.py", line 2815, in fromarray
    raise TypeError("Cannot handle this data type: %s, %s" % typekey) from e
TypeError: Cannot handle this data type: (1, 1, 74), |u1
im3=OpenSlide(r"F:\Leica-Fluorescence-1.scn")

Traceback (most recent call last):
  File "D:\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-21-6061d42f300d>", line 1, in <module>
    im3=OpenSlide(r"F:\Leica-Fluorescence-1.scn")
  File "D:\Python\Python39\lib\site-packages\openslide\__init__.py", line 160, in __init__
    self._osr = lowlevel.open(filename)
  File "D:\Python\Python39\lib\site-packages\openslide\lowlevel.py", line 136, in _check_open
    raise OpenSlideError(err)
openslide.lowlevel.OpenSlideError: Can't find main image

Originally posted by @One-sixth in #22 (comment)

Hamamatsu NDPI missing some levels

ts_slide = TiffSlide('OS-3.ndpi'), os_slide = OpenSlide('OS-3.ndpi')

    def test_level_count(ts_slide, os_slide):
>       assert ts_slide.level_count == os_slide.level_count
E       assert 10 == 13
E         +10
E         -13

Allow manually overriding which series to use in TiffSlide

This came up as a request (initial solution) from #21
While this might usually not required if the default behavior of TiffSlide returns the same series as openslide, it could be an interesting feature in case a user wants to change the default.

I'll have to think about it a bit more, if it's worth supporting this for any format.

tiffslide fails to detect svs format and MPP if ImageDescription contains non ascii characters

I have a tif file in which the mpp read using openslide is fine, but the mpp read in tiffslide is 'tiffslide.mpp-x': None, with the warning.
<tifffile.TiffTag 270 @829688804> coercing invalid ASCII to bytes
<tifffile.TiffTag 270 @829688804> coercing invalid ASCII to bytes

Is it because tiffslide still has some decoding deficiencies that cause it to fail to read out the mpp properly?

Here is the link to my file:
https://drive.google.com/file/d/1ctw5-oxXxtXYdGXnSO-AFFcW5OuMm8eV/view?usp=share_link

How to cite?

Hi,

How do we cite this repo in publications? πŸ˜„

Ideally, it would be great if there was a journal publication (e.g., JOSS), but even a Zenodo-generated DOI would suffice.

Cheers,
Sarthak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.