alleninstitute / aics-segmentation Goto Github PK

AICS Segmentation (One-Way) Mirror

License: Other

Python 6.79% Jupyter Notebook 93.18% Shell 0.01% Batchfile 0.01%

aics-segmentation's Introduction

NOTE: This repository has a new home. New development and release will be available via https://github.com/AllenCell/aics-segmentation

Overview

The Allen Cell Structure Segmenter is a Python-based open source toolkit developed for 3D segmentation of intracellular structures in fluorescence microscope images, developed at the Allen Institute for Cell Science. This toolkit consists of two complementary elements, a classic image segmentation workflow with a restricted set of algorithms and parameters and an iterative deep learning segmentation workflow. We created a collection of classic image segmentation workflows based on a number of distinct and representative intracellular structure localization patterns as a lookup table reference and starting point for users. The iterative deep learning workflow can take over when the classic segmentation workflow is insufficient. Two straightforward human-in-the-loop curation strategies convert a set of classic image segmentation workflow results into a set of 3D ground truth images for iterative model training without the need for manual painting in 3D. The Allen Cell Structure Segmenter thus leverages state of the art computer vision algorithms in an accessible way to facilitate their application by the experimental biology researcher. More details including algorithms, validations, and examples can be found in our bioRxiv paper or allencell.org/segmenter.

Note: This repository only has the code for the "Classic Image Segmentation Workflow". The deep learning part can be found at https://github.com/AllenInstitute/aics-ml-segmentation

We welcome feedback and submission of issues. Users are encouraged to sign up on our Allen Cell Discussion Forum for quesitons and comments.

Installation

Our package is implemented in Python 3.6. Detailed instructions as below:

Installation on Linux (Ubuntu 16.04.5 LTS is the OS we used for development)

Installation on MacOS

Installation on Windows

Use the package

Our package is designed (1) to provide a simple tool for cell biologists to quickly obtain intracellular structure segmentation with reasonable accuracy and robustness over a large set of images, and (2) to facilitate advanced development and implementation of more sophisticated algorithms in a unified environment by more experienced programmers.

Visualization is a key component in algorithm development and validation of results (qualitatively). Right now, our toolkit utilizes itk-jupyter-widgets, which is a very powerful visualization tool, primarily for medical data, which can be used in-line in Jupyter notebooks. Some cool demo videos can be found here.

Part 1: Quick Start

After following the installation instructions above, users will find that the classic image segmentation workflow in the toolkit is:

formulated as a simple 3-step workflow for solving 3D intracellular structure segmentation problem using restricted number of selectable algorithms and tunable parameters
accompanied by a "lookup table" with 20 representative structure localization patterns and their results as a reference, as well as the Jupyter notebook for these workflows as a starting point. The pseudocode of all 20 workflows are also provided.

Typically, we use Jupyter notebook as a "playground" to explore different algorithms and adjust the parameters. After determining the algorithms and parameters, we use Python scritps to do batch processing/validation on a large number of data.

You can find a DEMO on a real example on our tutorial page

Part 2: API

The list of high-level wrappers/functions used in the package can be found HERE. We are working on additional documentations and examples for advanced users/developers.

Object Identification: Bridging the gap between binary image (segmentation) and analysis

The current version of the Allen Cell Segmenter is primarily focusing on converting fluorescent images into binary images, i.e., the mask of the target structures separated from the background (a.k.a segmentation). But, the binary images themselves are not always useful, with perhaps the exception of visualization of the entire image, until they are converted into statistically sound numbers that are then used for downstream analysis. Often the desired numbers do not refer to all masked voxels in an entire image but instead to specific “objects” or groups of objects within the image. In our python package, we provide functions to bridge the gap between binary segmentation and downstream analysis via object identification.

What is object identification?

See a real demo in jupyter notebook to learn how to use the object identification functions

Citing Segmenter

If you find our segmenter useful in your research, please cite our bioRxiv paper:

J. Chen, L. Ding, M.P. Viana, M.C. Hendershott, R. Yang, I.A. Mueller, S.M. Rafelski. The Allen Cell Structure Segmenter: a new open source toolkit for segmenting 3D intracellular structures in fluorescence microscopy images. bioRxiv. 2018 Jan 1:491035.

Level of Support

We are offering it to the community AS IS; we have used the toolkit within our organization. We are not able to provide guarantees of support.

aics-segmentation's People

Contributors

Stargazers

Watchers

Forkers

lucdoh eriken1 danglive hiroshiebata jburel syaffa ferrinm

aics-segmentation's Issues

Failure to cite libraries used in paper

In the segmenter BioArxiv paper there is no citation of the underlying libraries, such as NumPy, SciPy, and scikit-image. Would it be possible to rectify the situation by releasing a new version of the paper?

The authors of those libraries depend on citations to 1. know when their libraries are used and 2. request funding because those libraries play an important role in the ecosystem. Thank you for understanding!

Running aicssegmentation code with a dask array and compute?

I am trying to run a few aics-segmentation functions on a dask array so I can process a number of stacks in parallel.

For example aicssegmentation.core.vessel.filament_3d_wrapper ...
1) If I run it on a dask array of length 1, it completes 1x stack in ~20 seconds with minimal CPU usage. This is about the same as running without a wrapping dask array ... good.
2) If I run it on a dask array of length 4, it completes each 1x stack in ~600 seconds with CPU looking like the 1x case. The 4x stacks are run in parallel but are not increasing CPU usage and are ~30 times slower than a 1x stack? [update], ran it again with a np.float and each call to filament_3d_wrapper when run across 4x stacks took ~1240 seconds, yikes!

I started looking at the source and after some tracing came up with no obvious reason. All I see is normal Python/NumPy/SciPy code? Seem to remember that aics-segmentation has a set of batch functions? Should I use that instead? Any links to example code?

Here is some sample code. In particular, scipy.ndimage.median_filter seems to work fine (runs in parallel and maxes out CPU) but filament_3d_wrapper runs >30x slower and does not max out the CPU (looks like usage at 1x stack).

import time
import numpy as np
import scipy

import dask
import dask.array as da

from aicssegmentation.core.vessel import filament_3d_wrapper

def myRun(path, commonShape, common_dtype):

	# create fake data
	stackData = np.random.normal(loc=100, scale=10, size=commonShape)
	#stackData = stackData.astype(common_dtype)
	
	# takes about 9 seconds if we have 1x in dask array
        # and still 9 seconds if we have 4x in dask array
	medianKernelSize = (3,4,4)
	print('  median filter', path)
	startTime = time.time()
	#
	smoothData = scipy.ndimage.median_filter(stackData, size=medianKernelSize)
	#
	stopTime = time.time()
	print('    median filter done in', round(stopTime-startTime,2), 'seconds', path)
	
	# takes about 19 seconds if we have 1x in dask array
        # but 500+ seconds if we have 4x in dask array
	print('  filament_3d_wrapper', path)
	startTime = time.time()
	#
	f3_param=[[1, 0.01]]
	filamentData = filament_3d_wrapper(smoothData, f3_param)
	filamentData = filamentData.astype(np.uint8)
	#
	stopTime = time.time()
	print('    filament_3d_wrapper done in', round(stopTime-startTime,2), 'seconds', path)
	
if __name__ == '__main__':

	# if I feed dask 1x stacks
	# filament_3d_wrapper returns in about 19 seconds (per stack)
	filenames = ['1']

	# if I feed dask 4x stacks
	# filament_3d_wrapper will run all 4 in parallel but CPU usage does not increase by 4x,
        # looks like I am running just 1x
	# filament_3d_wrapper returns in about 550-650 seconds (per stack)
	filenames = ['1', '2', '3', '4']
	
	# da.from_delayed() needs to know the shape and dtype it will work with?
	commonShape = (64, 512, 512)
	common_dtype = np.float #np.uint8

	# wrap myRun() function as a dask.delayed()
	myRun_Dask = dask.delayed(myRun)
	
	lazy_arrays = [dask.delayed(myRun_Dask)(filename, commonShape, common_dtype) for filename in filenames]

	lazy_arrays = [da.from_delayed(x, shape=commonShape, dtype=common_dtype) for x in lazy_arrays]

	x = da.block(lazy_arrays)
	
	x.compute()

batch_processing fails to generate output when I lock the computer screen

Hi,

I have been using batch_processing.py (run with run_toolkit.sh bash script) to perform segmentation for my data. Once I execute the bash script, I start getting the output as per expectation.

However, when I lock the screen so as to wait for the segmentation overnight, I see that the processing had stalled at the very moment where I had left before locking the screen. I am NOT letting the computer to go to sleep (just locking screen).

Anyone has any clues why this might be happening?

Note: I am using an iMac which has macOD High Sierra version 10.13.6

batch_processing.py does not recognize tiff files in some cases

Hi,

I have been trying to segment some of my data using batch_processing.py in classic code. I have been doing segmentation over two sets of data which have exactly similar file structure (as in data.shape, data file extension being tiff). However, one set of files run perfectly fine but the run_toolkit.sh script shows the following error for the other set of files:

AICSImage module does not support this image file type: CellImage can only accept OME-TIFF, TIFF, and CZI file formats!

Surprisingly, when I read in the same data file in an ipython console, I can very well read the data and segment it.

Not sure what's going wrong here. Could someone please throw some light on this?

NOTE: I don't think it should matter but I would also mention that I am trying to run batch_processing.py for both the datasets simultaneously using their respective run_toolkit.sh bash scripts.

Here's my run_toolkit.sh bash script:


WFNAME=supercell
INPUTCH=2
OUTPUTDIR="/Users/sharm261/Desktop/stack_images_zdr/segmented_stacks/"
INPUTFILE_TYPE='.tif'

# script for processing a whole folder
INPUTDIR="/Users/sharm261/Desktop/original_img_stack/"

python batch_processing.py \
        --d \
        --workflow_name $WFNAME \
        --struct_ch $INPUTCH \
        --output_dir $OUTPUTDIR \
        per_dir \
        --input_dir $INPUTDIR \
        --data_type $INPUTFILE_TYPE```

edge_preserving_smoothing_3d should account for voxel spacing

Edge preserving smoothing in 3D depends on voxel spacing. Below is the code you currently have for edge_preserving_smoothing_3d under aicssegmentation/core/pre_processing_utils.

Rather, this is what I think should be incorporated:

What are your thoughts?