Coder Social home page Coder Social logo

pd60193 / py-wsi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ysbecca/py-wsi

0.0 0.0 0.0 8.44 MB

Python package for dealing with whole slide images (.svs) for machine learning, particularly for fast prototyping. Includes patch sampling and storing using OpenSlide. Patches may be stored in LMDB, HDF5 files, or to disk. It is highly recommended to fork and download this repository so that personal customisations can be made for your work.

Home Page: https://ysbecca.github.io/programming/2018/05/22/py-wsi.html

License: GNU General Public License v3.0

Jupyter Notebook 97.48% Python 2.52%

py-wsi's Introduction

py-wsi Introduction

It is strongly recommended to use py-wsi version >= 1.0 and to fork and download this repository. This repo contains all the most recent features and fixes. The current version has not been packaged in PyPI yet.

Please feel free to submit any issues to the GitHub repository and I will provide help as I am able to. While suggestions for extra/additional functionality will not be immediately considered, pull requests are welcome.

If you use py-wsi in any published work, please credit Rebecca Stone.

1.0 Current version (py-wsi 2.0): HDF5 and saving patches to disk

This update to py_wsi has added two new functionalities:

For those using older versions of py_wsi with LMDB storage: this updated version is backwards compatible and your old code will not be affected. Not changing the function signatures turned out tricky -- I should have thought out the code structure more carefully for the initial LMDB version, so apologies for those who dive into the code. There are plenty of comments for those who want to tweak things.

Check Jupyter Notebook on GitHub to view example usage: Example usage of py-wsi

2.0 Overview

See this blog post py_wsi for computer analysis on whole slide .svs images using OpenSlide for help on understanding the relationship between patch and tile sampling. The test patch sampling functionality in this version will also help users to know exactly what they are sampling.

2.1 Introduction to py_wsi

py-wsi provides a series of Python classes and functions which deal with databases of whole slide images (WSI), or Aperio .svs files for machine learning, using Python OpenSlide. py-wsi provides functions to perform patch sampling from .svs files, generation of metadata, and several store options for the sampled patches:

  • lightning memory-mapped database (LMDB)
  • hierarchical data formatted (HDF5) files
  • to disk as PNG files

Lim et al. in "An analysis of image storage systems for scalable training of deep neural networks" perform a thorough evaluation of the best image storage systems, taking into consideration memory usage and access speed. LMDB, a B+tree based key-value storage, is not the most memory efficient, but provides optimal read time. In my personal research I find that HDF5 performs just as well, and is better for certain use cases. Storing to and loading from disk is significantly slower than both LMDB and HDF5 but the option is included for those who may have need of it.

You can read about the various supported formats and their Python libraries here:

py-wsi uses OpenSlide Python. According to the Python OpenSlide website, "OpenSlide is a C library that provides a simple interface for reading whole-slide images, also known as virtual slides, which are high-resolution images used in digital pathology. These images can occupy tens of gigabytes when uncompressed, and so cannot be easily read using standard tools or libraries, which are designed for images that can be comfortably uncompressed into RAM. Whole-slide images are typically multi-resolution; OpenSlide allows reading a small amount of image data at the resolution closest to a desired zoom level."

2.2 Requirements

This library was built using the following, but may be compatible with previous versions:

python==3.6.1
numpy==1.15.2
lmdb==0.93
openslide-python==1.1.1
Shapely==1.6.4
h5py==2.7.0

  1. Check dependencies listed in above and in setup.py; notably, openslide, openslide-python, lmdb, and h5py. The python geometry package Shapely is used for inferring labels from XML annotations.
brew install openslide
  1. Fork and download this repository, then import into your working directory (highly recommended, since you will most likely want to customise and add extra features!) OR install py_wsi using pip (not recommended; the version will always be behind).
pip install py_wsi
  1. Check out Jupyter Notebook "Using py-wsi" to see what py-wsi can do and get started!

Feel free to contact me with any issues and feedback.

py-wsi's People

Contributors

ysbecca avatar mingrui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.