btcrabb / slideseg Goto Github PK

A Python module that produces image patches and annotation masks from whole slide images for deep learning in digital pathology.

License: MIT License

Python 40.94% Jupyter Notebook 59.06%

digital-pathology semantic-segmentation whole-slide-imaging convolutional-neural-networks

slideseg's Introduction

SlideSeg

Author: Brendan Crabb [email protected]
Created August 1, 2017

Welcome to SlideSeg, a python module that allows you to segment whole slide images into usable image chips for deep learning. Image masks for each chip are generated from associated markup and annotation files.

If you use this code for research purposes, please cite the following in your paper:

Brendan Crabb, Niels Olson, "SlideSeg: a Python module for the creation of annotated image repositories from whole slide images", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 105811C (6 March 2018); doi: 10.1117/12.2300262; https://doi.org/10.1117/12.2300262

Updated Python 3 Version

For a version of SlideSeg compatible with python 3, please see https://github.com/abcsFrederick/SlideSeg3. Moreover, this version also support multiprocessing, drastically decreasing required processing times.

User Guide

Dependencies
Anaconda Environment
2.1 Creating Environment from .yml File
2.2 Installing C Libraries (Windows)
2.3 Installing C Libraries (Mac OS X)
2.4 Launching Jupyter Notebook
2.5 Change Jupyter Notebook startup folder (Windows)
2.6 Change Jupyter Notebook startup folder (OS X)
2.7 Jupyter Kernel Selection
Setup
3.1 Supported Formats
3.2 Parameters
3.3 Annotation Key
Output
5.1 Image_Chips
5.2 Image_Masks
5.3 Text Files
Run

User Guide

1. Dependencies

SlideSeg runs on Python 2.7 and depends on the following libraries:

openslide 1.1.1
tqdm 4.15.0
cv2 3.2.0
numpy
pexif 0.15

The libraries can be installed using:

pip install slideseg

If pip isn't installed, you may have to enter the following before installing slideseg (OS X):

sudo easy_install pip

If you are using the preconfigured SlideSeg anaconda environment, these dependencies will already be installed. SlideSeg also depends on several C libraries; see section 2.2 (windows) and section 2.3 (Mac OS X) for installation instructions.

2. Anaconda Environment

Make sure anaconda is installed. The SlideSeg environment has an Ipython kernel with all of the necessary packages already installed; however, conda support for jupyter notebooks is needed to switch kernels. This support is available through conda itself and can be enabled by issuing the following command:

conda install nb_conda

2.1 Creating environment from .yml file

Copy the environment_slideseg.yml file to the anaconda directory, .../anaconda/scripts/. In the same directory, issue the following command to create the anaconda environment from the file:

conda env create -f environment_slideseg.yml

Creating the environment might take a few minutes. Once finished, issue the following command to activate the environment:

Windows: activate SlideSeg
macOS and Linux: source activate SlideSeg

If the environment was activated successfully, you should see (SlideSeg) at the beggining of the command prompt. This will set the SlideSeg kernel as your default kernel when running jupyter.

2.2 Installing C Libraries (Windows)

OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.

The Windows Binaries for OpenSlide can be found at 'openslide.org/download/'. Download the appropriate binaries for your system (either 32-bit or 64-bit) and unzip the file.

Copy the .dll files in ../bin/ to .../Anaconda/envs/SlideSeg/Library/bin/.

Copy the .h files to .../Anaconda/envs/SlideSeg/include/.

Finally, copy the .lib file to .../Anaconda/envs/SlideSeg/libs/.

OpenSlide has now been installed.

Use the following tutorial to download OpenCV, either from prebuilt binaries or from source:

http://docs.opencv.org/3.2.0/d5/de5/tutorial_py_setup_in_windows.html

2.3 Installing C Libraries (Mac OS X)

OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.

If you are using Homebrew, enter the following in the terminal:

brew install opencv

brew install openslide

OpenSlide and OpenCV should now be installed in your anaconda environment.

2.4 Launching Jupyter Notebook

The Jupyter Notebook App can be launched by clicking on the Jupyter Notebook icon installed by Anaconda in the start menu (Windows) or by typing in the terminal (cmd on Windows):

jupyter notebook

This will launch a new browser window showing the Notebook Dashboard. When started, the Jupyter Notebook app can only access files within its start-up folder. If you stored the SlideSeg notebook documents in a subfolder of your user folder, no configuration is necessary. Otherwise, you need to change your Jupyter Notebook App start-up folder.

2.5 Change Jupyter Notebook startup folder (Windows)

Copy the Jupyter Notebook launcher from the menu to the desktop.
Right click on the new launcher, select properties, and change the Target field, change %USERPROFILE% to the full path of the folder which will contain all the notebooks.
Double-click on the Jupyter Notebook desktop launcher (icon shows [IPy]) to start the Jupyter Notebook App, which will open in a new browser window (or tab). Note also that a secondary terminal window (used only for error logging and for shut down) will be also opened. If only the terminal starts, try opening this address with your browser: http://localhost:8888/.

2.6 Change Jupyter Notebook startup folder (OS X)

To launch Jupyter Notebook App:

Click on spotlight, type terminal to open a terminal window.
Enter the startup folder by typing cd /some_folder_name.
Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).

2.7 Jupyter Kernel Selection

After launching the Jupyter Notebook App, navigate to the SlideSeg notebook and click on its name to open in a new browser tab. In the upper right corner, you should see Python [conda env:SlideSeg]. If not, click on Kernel> Change Kernel> and change your current kernel to Python [conda env:SlideSeg].

3. Setup

Create a folder called 'images/' in the main directory and copy all of the slide images into this folder. Copy the markup and annotation files (in .xml format) into the xml folder in the main project directory. It is important that the annotation files have the same file name as the slide they are associated with.

3.1 Supported Formats

SlideSeg can read virtual slides in the following formats:

Aperio (.svs, .tif)
Hamamatsu (.ndpi, .vms, .vmu)
Leica (.scn)
MIRAX (.mrxs)
Philips (.tiff)
Sakura (.svslide)
Trestle (.tif)
Ventana (.bif, .tif)
Generic tile TIFF (.tif)

SlideSeg can read annotations in the following formats:

XML (.xml)

3.2 Parameters

SlideSeg depends on the following parameters:

slide_path: Path to the folder of slide images

xml_path: Path to the folder of xml files

output_dir: Path to the output folder where image_chips, image_masks, and text_files will be saved

format: Output format of the image_chips and image_masks (png or jpg only)

quality: Output quality: JPEG compression if output format is 'jpg' (100 recommended,jpg compression artifacts will distort image segmentation)

size: Size of image_chips and image_masks in pixels

overlap: Pixel overlap between image chips

key: The text file containing annotation keys and color codes

save_all: True saves every image_chip, False only saves chips containing an annotated pixel

save_ratio: Ratio of image_chips containing annotations to image_chips not containing annotations (use 'inf' if only annotated chips are desired; only applicable if save_all == False

3.3 Annotation Key

The main directory should already contain an Annotation_Key.txt file. If no Annotation_Key file is present, one will be generated automatically from the annotation files in the xml folder.

The Annotation_Key file contains every annotation key with its associated color code. In all image masks, annotations with that key will have the specified pixel value. If an unknown key is encountered, it will be given a pixel value and added to the Annotation_Key automatically.

The following functions are defined within the slideseg module and used to generate, edit, and read the annotation key:

<code>def loadkeys(annotation_key):
    """
    Opens annotation_key file and loads keys and color codes
    :param: annotation_key: the filename of the annotation key
    :return: color codes
    """
    
def addkeys(annotation_key, key):
    """
    Adds new key and color_code to annotation key
    :param annotation_key: the filename of the annotation key
    :param key: The annotation to be added
    :return: updated annotation key file
    """
    
 def writeannotations(annotation_key, annotations):
    """
    Writes annotation keys and color codes to annotation key text file
    :param annotation_key: filename of annotation key
    :param annotations: Dictionary of annotation keys and color codes
    :return: .txt file with annotation keys
    """
    
def generatekey(annotation_key, path):
    """
    Generates annotation_key from folder of xml files
    :param annotation_key: the name of the annotation key file
    :param path: Directory containing xml files
    :return: annotation_key file
    """

5. Output

5.1 Image_chips

Every generated image chip will be saved in the output/image_chips folder. The chips are saved with the naming convention of slide filename_level number_row_column.format. If the chip contains an area that was annotated and the tags are enabled, it will have an associated tag (under the Subject category) with the annotation key. If the image chip does not contain annotations, the 'NONE' tag will be added. To view these tags, switch to details view and click display 'Subject' in the explorer. The files can be sorted according to their tags. Unfortunately, these tags will only be available if the output format is .jpg.

The following functions are defined in the slideseg module and are used to save both the image chips and image masks, as well as attaching exif metadata to the images:

def ensuredirectory(dest): 
    """ 
    Ensures the existence of a directory 
    :param dest: Directory to ensure.
    :return: new directory if it did not previously exist. 
    """ 

def attachtags(path, keys):
    """
    Attaches image tags to metadata of chips and masks
    :param path: file to attach tags to.
    :param keys: keys to attach as tags
    :return: JPG with metadata tags 
    """

def savechip(chip, path, quality, keys):
    """
    Saves the image chip
    :param chip: the slide image chip to save
    :param path: the full path to the chip
    :param quality: the output quality
    :param keys: keys associated with the chip
    :return:
    """

def savemask(mask, path, keys):
    """
    Saves the image masks
    :param mask: the image mask to save
    :param path: the complete path for the mask
    :param keys: keys associated with the chip
    :return:
    """

def checksave(save_all, pix_list, save_ratio, save_count_annotated, save_count_blank):
    """
    Checks whether or not an image chip should be saved
    :param save_all: (bool) saves all chips if true
    :param pix_list: list of pixel values in image mask
    :param save_ratio: ratio of annotated chips to unannotated chips
    :param save_count_annotated: total annotated chips saved
    :param save_count_blank: total blank chips saved
    :return: bool
    """

def formatcheck(format):
    """
    Assures correct format parameter was defined correctly
    :param format: the output format parameter
    :return: format
    :return: suffix
    """

The following functions are defined in the slideseg module and are used to save both the image chips and image masks, as well as attaching exif metadata to the images:

def ensuredirectory(dest):
    """
    Ensures the existence of a directory
    :param dest: Directory to ensure.
    :return: new directory if it did not previously exist.
    """

def attachtags(path, keys):
    """
    Attaches image tags to metadata of chips and masks
    :param path: file to attach tags to.
    :param keys: keys to attach as tags
    :return: JPG with metadata tags
    """

def savechip(chip, path, quality, keys):
    """
    Saves the image chip
    :param chip: the slide image chip to save
    :param path: the full path to the chip
    :param quality: the output quality
    :param keys: keys associated with the chip
    :return:
    """

def savemask(mask, path, keys):
    """
    Saves the image masks
    :param mask: the image mask to save
    :param path: the complete path for the mask
    :param keys: keys associated with the chip
    :return:
    """

def checksave(save_all, pix_list, save_ratio, save_count_annotated, save_count_blank):
    """
    Checks whether or not an image chip should be saved
    :param save_all: (bool) saves all chips if true
    :param pix_list: list of pixel values in image mask
    :param save_ratio: ratio of annotated chips to unannotated chips
    :param save_count_annotated: total annotated chips saved
    :param save_count_blank: total blank chips saved
    :return: bool
    """

def formatcheck(format):
    """
    Assures correct format parameter was defined correctly
    :param format: the output format parameter
    :return: format
    :return: suffix
    """

5.2 Image_masks

An image mask for each image chip is saved in the output/image_masks folder. The mask has the same name as the image chip it is associated with. Furthermore, these masks will have the same tags, allowing you to sort by annotation type.

The following function handles the generation of an annotation mask from xml files:

def makemask(annotation_key, size, xml_path):
    """
    Reads xml file and makes annotation mask for entire slide image
    :param annotation_key: name of the annotation key file
    :param size: size of the whole slide image
    :param xml_path: path to the xml file
    :return: annotation mask
    :return: dictionary of annotation keys and color codes
    """

5.3 Text Files

A text file with details about annotations and image chips will also be saved to output/textfiles. For each slide image, this text file will contain a list of all annotation keys present in the image. For each annotation key, a list of every image chip/mask containing that specific key is also recorded in this file.

The following functions generates these .txt files:

def writekeys(filename, annotations):
    """
    Writes each annotation key to the output text file
    :param filename: filename of image chip
    :param annotations: dictionary of annotation keys
    :return: updated text file
    """

def writeimagelist(filename, image_dictionary):
    """
    Writes list of images containing each annotation key
    :param filename: the name of the slide image
    :param image_dictionary: dictionary of images with each key
    :return text
    """

6. Run

To execute SlideSeg, simply open the jupyter notebook and run the cells. Alternatively, you can run the python script 'main.py'. Make sure that you defined the Parameters. If the python script is used, the parameters are specified in the Parameters.txt file.

slideseg's People

Contributors

Stargazers

Watchers

slideseg's Issues

TypeError: '>' not supported between instances of 'float' and 'str'

I seem to get the below error, even when wanting all chips from a WSI.

loading test
test loaded successfully
loading annotation data from xml//test.xml
annotations loaded successfully
Scanning slide level 1 of 8
0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 30, in
main()
File "main.py", line 27, in main
slideseg.run(params, filename)
File "/home/tr00300/Desktop/SlideSeg-master/slideseg.py", line 521, in run
_mask, _annotations, filename, _suffix, _save_all, _save_ratio)
File "/home/tr00300/Desktop/SlideSeg-master/slideseg.py", line 452, in getchips
save = checksave(save_all, pix_list, save_ratio, _save_count_annotated, _save_count_blank)
File "/home/tr00300/Desktop/SlideSeg-master/slideseg.py", line 339, in checksave
elif save_count_annotated / float(save_count_blank) > save_ratio:
TypeError: '>' not supported between instances of 'float' and 'str'

The only way I can remove this error is by removing these elif statements from the slideseg.py file. Is there an alternative way to remove this error?

Dimension Issue

While running the code on my dataset. I encountered the following error all input arrays must have same dimensions.

The stack trace of the error is given below:
File "main.py", line 32, in
main()
File "main.py", line 28, in main
slideseg.run(params, filename)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\SlideSeg-master\SlideSeg-master\sl
ideseg.py", line 522, in run
_mask, _annotations, filename, _suffix, _save_all, _save_ratio)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\SlideSeg-master\SlideSeg-master\sl
ideseg.py", line 450, in getchips
pix_list = np.unique(img_mask)
File "C:\Users\hits\Anaconda3\envs\py34\lib\site-packages\numpy\lib\arraysetop
s.py", line 200, in unique
flag = np.concatenate(([True], aux[1:] != aux[:-1]))
ValueError: all the input arrays must have same number of dimensions

Please have a look at the error and let me know how can it be resolved @btcrabb

annotation file is blank

Hi,

I have an annotation file corresponding to a ndpi whole slide image. However this module does not recognise any of the annotations despite processing the slides.

I have attached the annotated xml file (originally an ndpi file).

Is there a reason why this happens?

141969 -2 - grade12018-05-22 03.06.15.ndpi.txt

NDPI/NDPA annotations and key generation

First of all, what a wonderful tool! I hope it works!

I've free-hand annotations on an ndpi file (the output by default is xml/ndpa). I'm struggling in the "translation" part to generate the key.

Runs fine at first:
C:\Tools\anaconda3\envs\SlideSeg\python.exe X:/functional/SlideSeg/main.py running __main__ with parameters: {'save_ratio': 'inf', 'format': 'jpg', 'tags': '1', 'slide_path': 'images/', 'overlap': '1', 'save_all': 'True', 'xml_path': 'xml/', 'output_dir': 'output/', 'key': '', 'print': '1', 'quality': '100', 'size': '128'}
loading Patient1.ndpi Patient1.ndpi loaded successfully loading annotation data from xml//Patient1.ndpi.xml Could not find , generating new file...

But then I see this:

Traceback (most recent call last):
File "X:/functional/SlideSeg/main.py", line 30, in
main()

File "X:/functional/SlideSeg/main.py", line 27, in main
slideseg.run(params, filename)

File "X:\functional\SlideSeg\slideseg.py", line 510, in run
_mask, _annotations = makemask(_key, _size, '{0}{1}'.format(_xml_path, xml_file))

File "X:\functional\SlideSeg\slideseg.py", line 81, in makemask
generatekey('{0}'.format(annotation_key), os.path.split(xml_path)[0])

File "X:\functional\SlideSeg\slideseg.py", line 233, in generatekey
writeannotations(annotation_key, annotations)

File "X:\functional\SlideSeg\slideseg.py", line 196, in writeannotations
file = open(annotation_key, "w+")
IOError: [Errno 22] invalid mode ('w+') or filename: ''

I must mention that the ndpi annotation file looks quite different to the provided example. How can we fix this? Just an example:

They're all just freehand regions, I see "titles" not really "attributes" or IDs...

Merging patches into one image

Hi, I have really found this software useful for my task. But, I want to merge the patches back into one image to visualize the final detections. Is their a way to do so? Can you help?

Create png images of the annotated region from WSI

How can I create png images of the annotated region for each WSIs? I have svs and xml file

Issues processing .scn (Leica) Images with SlideSeg

The image patches or chips are in dark black in colour with no image clarity.

I have been trying to extract patches from a small number of .scn images (Whole Slide Images). However the image patches or chips received turn out to be blank and black in colour. The images have annotations present in the xml file.

I have tried to modify the parameters.txt file to improve the output of the chips but it seems to have no avail. The output cannot be used used for any further Machine Learning Processing, few image chips have some part of the original .scn image but it doesn't seem like a chip but a rather a sort of segment however most of the chips are dark black and blank.

I have tried on different systems with different configurations but all of them give similar result. I have followed all the steps mentioned in the SlideSeg readme file and all the instructions given by you. I am attaching few image chips here. Please have a look at them and hope the issue gets resolved. @btcrabb

Unsupported or missing image file

I tried to use SlideSeg to segment whole slide images. My Image files are in tiff format. I encountered Unsupported or Missing Image file error while trying to do so. I have cross verified my filename, pathname etc. Everything seems to be correct. I have installed all library dependencies as well. Here is the stack trace of my error.

running main with parameters: {'save_ratio': 'inf', 'format': 'jpg', 'tags': '1', 'slide_path': 'images/', 'overlap': '1', 'save_all': 'False', 'xml_path':'xml/', 'output_dir': 'output/', 'key': 'Annotation_Key.txt', 'print': '0', 'quality': '95', 'size': '128'}
SplitImagesoutput4000_18900.tif loading SplitImagesoutput4000_18900.tif Traceback (most recent call last):
File "main.py", line 32, in
main()
File "main.py", line 28, in main
slideseg.run(params, filename)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\SlideSeg-master\SlideSeg-master\sl
ideseg.py", line 503, in run
_osr, _levels, _dims = openwholeslide('{0}{1}'.format(_slide_path, filename)
)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\SlideSeg-master\SlideSeg-master\sl
ideseg.py", line 383, in openwholeslide
osr = OpenSlide(path)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\lib\site-packages\openslide__init
__.py", line 154, in init
self._osr = lowlevel.open(filename)
File "C:\Users\hits\Anaconda3\envs\SlideSeg\lib\site-packages\openslide\lowlev
el.py", line 174, in _check_open

"Unsupported or missing image file")
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image
file

Please look into it . Thank you

"annotations" as defaultdict has no iteritems

Python-3.7.0. It appears that changing iteritems to items will resolve the issue.

The given parameters has save_ratio as 'inf', which is a string and not cast to float, and will eventually crash the checksave.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

btcrabb / slideseg Goto Github PK

slideseg's Introduction

SlideSeg

Updated Python 3 Version

Table of Contents

User Guide

1. Dependencies

2. Anaconda Environment

2.1 Creating environment from .yml file

2.2 Installing C Libraries (Windows)

2.3 Installing C Libraries (Mac OS X)

2.4 Launching Jupyter Notebook

2.5 Change Jupyter Notebook startup folder (Windows)

2.6 Change Jupyter Notebook startup folder (OS X)

2.7 Jupyter Kernel Selection

3. Setup

3.1 Supported Formats

3.2 Parameters

3.3 Annotation Key

5. Output

5.1 Image_chips

5.2 Image_masks

5.3 Text Files

6. Run

slideseg's People

Contributors

Stargazers

Watchers

Forkers

slideseg's Issues

The image patches or chips are in dark black in colour with no image clarity.

Recommend Projects

Recommend Topics

Recommend Org