Coder Social home page Coder Social logo

dchaley / deepcell-imaging Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 2.0 12.54 MB

Tools & guidance to scale DeepCell imaging on Google Cloud Batch

Jupyter Notebook 98.28% Python 1.41% Cython 0.23% Dockerfile 0.02% Shell 0.07%
ai bioinformatics cancer-research cloud gpu tensorflow

deepcell-imaging's Introduction

Cloud DeepCell - Scaling Image Analysis

This working Repo contains our notes / utilities / info for our cloud DeepCell imaging project.

Here is the high level workflow for using DeepCell:

high level workflowlucidchart source

Note that DeepCell itself does not process TIFF files. The TIFF channels must be extracted into Numpy arrays first.

Also note that DeepCell performs its own pre- and post-processing around the TensorFlow prediction. In particular, DeepCell divides the input into 512x512 tiles which it predicts in batches, then reconstructs the overall image.

tiling process

Goal and Key Links

  • GOAL: Understand and optimize using DeepCell to perform cellular image segmentation on GCP at scale.

Findings

GPU makes a dramatic difference in model inference time.

Pixels vs inference time

Pixels vs inference time

Memory usage increases linearly with number of pixels.

Pixels vs mem usage

Optimization opportunities

Here are some areas we've identified:

Local development

Mac OS x86_64

Nothing special. You just need Python 3.10 at the latest.

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Mac OS arm64

Some incantations are needed to work on Apple silicon computers. You also need Python 3.9.

DeepCell depends on tensorflow, not tensorflow-macos. Unfortunately we need tensorflow-macos specifically to provide TF2.8 on arm64 chips.

The solution is to install the packages one at a time so that the DeepCell failure doesn't impact the other packages.

python3.9 -m venv venv
source venv/bin/activate
pip install -r requirements-mac-arm64.txt
cat requirements.txt | xargs -n 1 pip install

# Let it fail to install DeepCell, then:
pip install -r requirements.txt --no-deps

# Lastly install our own library. Note --no-deps
pip install --editable . --no-deps

I think but am not sure that the first --no-deps invocation is unnecessary as pip install installs dependencies.

deepcell-imaging's People

Contributors

dchaley avatar lynnlangit avatar weihaoge1009 avatar langitlynn avatar

Stargazers

 avatar David Cardozo avatar Etienne Caron avatar  avatar  avatar Jahandar Jahanipour avatar Vikram avatar

Watchers

 avatar  avatar  avatar  avatar

deepcell-imaging's Issues

Run benchmark for `mesmer-sample-3` 1mb

Input channels path: gs://davids-genomics-data-public/cellular-segmentation/deep-cell/vanvalenlab-multiplex-20200810_tissue_dataset/mesmer-sample-3/input_channels.npz

Result spreadsheet.

  • n1-standard-1

  • n1-standard-1 + 1 P100

  • n1-standard-1 + 1 T4

  • n1-standard-1 + 1 V100

  • n1-standard-2

  • n1-standard-2 + 1 P100

  • n1-standard-2 + 1 T4

  • n1-standard-2 + 1 V100

  • n1-standard-4

  • n1-standard-4 + 1 P100

  • n1-standard-4 + 1 T4

  • n1-standard-4 + 1 V100

  • n1-standard-8

  • n1-standard-8 + 1 P100

  • n1-standard-8 + 1 T4

  • n1-standard-8 + 1 V100

  • n1-standard-16

  • n1-standard-16 + 1 P100

  • n1-standard-16 + 1 T4

  • n1-standard-16 + 1 V100

  • n1-standard-32

  • n1-standard-32 + 1 P100

  • n1-standard-32 + 1 T4

  • n1-standard-32 + 1 V100

  • n1-standard-64

  • n1-standard-64 + 1 P100

  • n1-standard-64 + 1 T4

  • n1-standard-64 + 1 V100

  • n1-standard-96

  • n1-standard-96 + 1 P100

  • n1-standard-96 + 1 T4

  • n1-standard-96 + 1 V100

  • n1-highmem-2

  • n1-highmem-2 + 1 P100

  • n1-highmem-2 + 1 T4

  • n1-highmem-2 + 1 V100

  • n1-highmem-4

  • n1-highmem-4 + 1 P100

  • n1-highmem-4 + 1 T4

  • n1-highmem-4 + 1 V100

  • n1-highmem-8

  • n1-highmem-8 + 1 P100

  • n1-highmem-8 + 1 T4

  • n1-highmem-8 + 1 V100

  • n1-highmem-16

  • n1-highmem-16 + 1 P100

  • n1-highmem-16 + 1 T4

  • n1-highmem-16 + 1 V100

  • n1-highmem-32

  • n1-highmem-32 + 1 P100

  • n1-highmem-32 + 1 T4

  • n1-highmem-32 + 1 V100

  • n1-highmem-64

  • n1-highmem-64 + 1 P100

  • n1-highmem-64 + 1 T4

  • n1-highmem-64 + 1 V100

  • n1-highmem-96

  • n1-highmem-96 + 1 P100

  • n1-highmem-96 + 1 T4

  • n1-highmem-96 + 1 V100

Build setup/instructions for cython fast-hybrid

The cython fast-hybrid implementation is a bit "raw", requiring manual cythonify in the right directory etc.

The file should be repackaged into a proper module in deepcell-imaging, with appropriate setup.py etc. so that pip knows to build the extension as part of installation.

This could also be accomplished by publishing the fast-hybrid implementation as its own library, and including that as a dependency to this repo.

Parameterize batch_size for predict method

Adding GPUs is not improving performance. This is a bit surprising considering how 1 GPU improves performance dramatically.

Image

Are we not leveraging multiple GPUs? Possibly. With even 1 GPU, we aren't maxing out the GPU:

Image

The batch_size parameter defaults to 4, which controls the number of images we sent to TensorFlow in parallel.

  • Add batch size to notebook parameters
  • Add batch size to benchmark output

Then we can run some benchmarks with increased batch size (with 1 + several GPUs) <-- make follow-up issues.

Build csv to chart

Now that we have the benchmarking from PR #42 , which generates CSV output, get a bunch of local data on various images and test how to visualize it.

  • test data
  • build visualization

Run benchmark for human-prostate-cancer-20210727-725mb

Input channels path: gs://davids-genomics-data-public/cellular-segmentation/10x-genomics/human-prostate-cancer-20210727-725mb/input_channels.npz

Results spreadsheet.

  • n1-standard-1
    SKIP, expect out-of-memory

  • n1-standard-1 + 1 P100
    SKIP, expect out-of-memory

  • n1-standard-1 + 1 T4
    SKIP, expect out-of-memory

  • n1-standard-1 + 1 V100
    SKIP, expect out-of-memory

  • n1-standard-2
    SKIP, expect out-of-memory

  • n1-standard-2 + 1 P100
    SKIP, expect out-of-memory

  • n1-standard-2 + 1 T4
    SKIP, expect out-of-memory

  • n1-standard-2 + 1 V100
    SKIP, expect out-of-memory

  • n1-standard-4
    SKIP, expect out-of-memory

  • n1-standard-4 + 1 P100
    SKIP, expect out-of-memory

  • n1-standard-4 + 1 T4
    SKIP, expect out-of-memory

  • n1-standard-4 + 1 V100
    SKIP, expect out-of-memory

  • n1-standard-8
    SKIP, expect out-of-memory

  • n1-standard-8 + 1 P100
    SKIP, expect out-of-memory

  • n1-standard-8 + 1 T4
    FAIL, out of memory

  • n1-standard-8 + 1 V100
    SKIP, expect out-of-memory

  • n1-standard-16
    SKIP, expect out-of-memory

  • n1-standard-16 + 1 P100
    SKIP, expect out-of-memory

  • n1-standard-16 + 1 T4
    SKIP, expect out-of-memory

  • n1-standard-16 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-32
    SKIP, more CPU not helpful

  • n1-standard-32 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-32 + 1 T4

  • n1-standard-32 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-64
    SKIP, more CPU not helpful

  • n1-standard-64 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-64 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-standard-64 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-96
    SKIP, more CPU not helpful

  • n1-standard-96 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-96 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-standard-96 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-2
    SKIP, expect out-of-memory

  • n1-highmem-2 + 1 P100
    SKIP, expect out-of-memory

  • n1-highmem-2 + 1 T4
    SKIP, expect out-of-memory

  • n1-highmem-2 + 1 V100
    SKIP, expect out-of-memory

  • n1-highmem-4
    SKIP, expect out-of-memory

  • n1-highmem-4 + 1 P100
    SKIP, expect out-of-memory

  • n1-highmem-4 + 1 T4
    SKIP, expect out-of-memory

  • n1-highmem-4 + 1 V100
    SKIP, expect out-of-memory

  • n1-highmem-8
    FAIL, out of memory

  • n1-highmem-8 + 1 P100
    SKIP, expect out-of-memory

  • n1-highmem-8 + 1 T4
    SKIP, expect out-of-memory

  • n1-highmem-8 + 1 V100
    SKIP, expect out-of-memory

  • n1-highmem-16

  • n1-highmem-16 + 1 P100

  • n1-highmem-16 + 1 T4

  • n1-highmem-16 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-32
    SKIP, more CPU not helpful

  • n1-highmem-32 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-32 + 1 T4
    SKIP, more CPU not helpful

  • n1-highmem-32 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-64
    SKIP, more CPU not helpful

  • n1-highmem-64 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-64 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-highmem-64 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-96
    SKIP, more CPU not helpful

  • n1-highmem-96 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-96 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-highmem-96 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

Run benchmark for `preview-human-breast-20221103-418mb`

Input channels path: gs://davids-genomics-data-public/cellular-segmentation/10x-genomics/preview-human-breast-20221103-418mb/input_channels.npz

Result spreadsheet.

  • n1-standard-1
    SKIP, expect OOM

  • n1-standard-1 + 1 P100
    SKIP, expect OOM

  • n1-standard-1 + 1 T4
    SKIP, expect OOM

  • n1-standard-1 + 1 V100
    SKIP, expect OOM

  • n1-standard-2
    SKIP, expect OOM

  • n1-standard-2 + 1 P100
    SKIP, expect OOM

  • n1-standard-2 + 1 T4
    SKIP, expect OOM

  • n1-standard-2 + 1 V100
    SKIP, expect OOM

  • n1-standard-4
    FAIL, out of memory

  • n1-standard-4 + 1 P100
    SKIP, expect OOM

  • n1-standard-4 + 1 T4
    FAIL, out of memory

  • n1-standard-4 + 1 V100
    SKIP, expect OOM

  • n1-standard-8

  • n1-standard-8 + 1 P100

  • n1-standard-8 + 1 T4

  • n1-standard-8 + 1 V100

  • n1-standard-16
    SKIP, more CPU not helpful

  • n1-standard-16 + 1 P100
    SKIP, more CPU not helpful

  • n1-standard-16 + 1 T4

  • n1-standard-16 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-32
    SKIP, more CPU not helpful

  • n1-standard-32 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-32 + 1 T4
    SKIP, more CPU not helpful

  • n1-standard-32 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-64
    SKIP, more CPU not helpful

  • n1-standard-64 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-64 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-standard-64 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-standard-96
    SKIP, more CPU not helpful

  • n1-standard-96 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-standard-96 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-standard-96 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-2
    SKIP, expect OOM

  • n1-highmem-2 + 1 P100
    SKIP, expect OOM

  • n1-highmem-2 + 1 T4
    SKIP, expect OOM

  • n1-highmem-2 + 1 V100
    SKIP, expect OOM

  • n1-highmem-4

  • n1-highmem-4 + 1 P100

  • n1-highmem-4 + 1 T4

  • n1-highmem-4 + 1 V100

  • n1-highmem-8
    SKIP, more CPU not helpful

  • n1-highmem-8 + 1 P100
    SKIP, more CPU not helpful

  • n1-highmem-8 + 1 T4
    SKIP, more CPU not helpful

  • n1-highmem-8 + 1 V100
    SKIP, more CPU not helpful

  • n1-highmem-16
    SKIP, more CPU not helpful

  • n1-highmem-16 + 1 P100
    SKIP, more CPU not helpful

  • n1-highmem-16 + 1 T4
    SKIP, more CPU not helpful

  • n1-highmem-16 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-32
    SKIP, more CPU not helpful

  • n1-highmem-32 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-32 + 1 T4
    SKIP, more CPU not helpful

  • n1-highmem-32 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-64
    SKIP, more CPU not helpful

  • n1-highmem-64 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-64 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-highmem-64 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

  • n1-highmem-96
    SKIP, more CPU not helpful

  • n1-highmem-96 + 1 P100
    NOT AVAILABLE, 24 vCPUs at maximum per 1 P100

  • n1-highmem-96 + 1 T4
    NOT AVAILABLE, 48 vCPUs at maximum per 1 T4

  • n1-highmem-96 + 1 V100
    NOT AVAILABLE, 12 vCPUs at maximum per 1 V100

Fix peak RAM metric for Vertex AI

The current method for getting peak RAM usage on local does not work on Vertex AI notebooks:

peak_mem_b = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Figure out a fix, or just abandon the metric.

Add Mesmer samples to repo

To support #10 , let's at least add DeepCell's Mesmer data (multiplex_tissue) to the repo. This gives us an easily accessible starting point for test data (albeit quite small at 512 x 512).

See the DeepCell mesmer sample notebook

  • numpy inputs
  • rgb inputs
  • raw predictions
  • output predictions & rgb image

Get cost tables

Fill out the cost tables in this sheet.

Image

Method:

  • Start creating a notebook.
  • Select hardware configuration.
  • You should see a cost table like this:

Image

  • Copy these fields into the sheet:
    • TOTAL (put this into discounted $/mo)
    • Sustained use discount (put this into discount $/mo)
  • Then copy the other computed columns from previous rows

Define test configurations per sample file

For each sample file, specify the configurations to run for the benchmark.

  • mesmer sample 3
  • pick an ark-angelo sample
  • preview-human-breast-20221103-418mb
  • human-prostate-cancer-20210727-725mb blocked on #38

Expected output: a github issue, with a checkmark of configurations (machine type, gpu type, gpu number)

See this list for machine types:
https://cloud.google.com/compute/docs/general-purpose-machines#n1_machine_types

Note that we can't run "interesting" GPU configurations (more than 1, or fancy GPUs), we're "calling a friend" see also #6

Create runnable 'predict' notebook

As a user I can: go to github repo, download it to local, get ipython notebook, get sample data, upload notebook to test env, (vertex ai), config info (instance types/size, GPUs, ...) to verify. (Part of test is to figure out config parameters)

  • Notebook that runs prediction on parameterized input file
  • Script to create notebook execution w/ specified machine parameters (instance type, GPU)

Result of running notebook: timing matrix of input size vs config parameters

Benchmark will be: whole-cell compartment. (Potentially add nuclear/both later, opportunity for parallelization later.)

Develop test suite for optimizations

I want to iterate rapidly while prepping my optimizations for merge. That means automated testing not manually running notebooks.

Rather than write my own tests, let's use the tests in scikit already 😎

The initial version will need non-supported tests disabled, these will be created as issues in the milestone.

Document BigQuery benchmarks table

The end-to-end benchmark uploads its results to the BigQuery table: benchmarking.results

The table has a schema but otherwise no documentation. That should be changed 😎

Address result type mismatch

The test test_two_image_peaks asserts that out.dtype == _supported_float_type(mask.dtype). Meanwhile the current reconstruct implementation indeed creates the result image as a float, even if it was ints to begin with.

I'm not sure we need to (always?) do this. The core of the algorithm is to adjust to the neighborhood max. The max can't have more precision than any of the starting numbers. And the max can't be capped to more precision than the mask precision. So if ints are masking ints, why not have int results?

However floats masked by ints are floats, and arguably ints masked by floats should be floats. for example 10 mask 0.51 should be 5.1 not 5. (Right?)

The current behavior is to always return floats. This is undesirable for performance as it precludes updating in-place. I wonder if we can simply ship this as new behavior. Downstream usages could be affected if they assume floats and start getting ints. Can we control via "Yet Another Parameter"™️ ?

Unexpected segments in prostate cancer results

Running these benchmarks: #73

We noticed that the output file looks a bit strange. Here's an example:

Screenshot 2023-12-13 at 1 52 43 PM

It appears to be circling artifacts outside the tissue, as well as not circling cells as we expect within the tissue.

DEBUG APPROACH:

Inconsistent array shape: preview-human-breast-20221103-418mb

The samples located here:

gs://davids-genomics-data-public/cellular-segmentation/10x-genomics/preview-human-breast-20221103-418mb

were generated with a previous convention, following the DeepCell API (one file == 4D array starting with num samples).

The other samples are 3D: x, y, channel. (One array == one input)

This creates problems because worksheets & people don't know which shape to expect, therefore when they need to do a new-axis, or not.

We should normalize one way or the other. My general thinking is that a thing is a single thing, until it is a group of things– which we could represent as either a list of things, or a numpy vector of the things. In other words, the shape of a single data example is not a list.

Figure out kernel version mismatch

When running the e2e benchmark notebook on Vertex AI, there was a kernel warning:

2023-12-03 07:19:17.937327: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/jupyter/.local/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-03 07:19:17.937385: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-12-03 07:19:17.937413: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (72a01191f8f9): /proc/driver/nvidia/version does not exist

I think this is because we're using a TF 2.10 kernel but have installed TF 2.8 (DeepCell's dependency).

If the kernel is relevant: how can we fix this?

If the kernel is irrelevant: can we use something different? Like a basic python kernel?

I'm not sure how much to worry about this– perhaps it means we (and/or DeepCell??) aren't using modern Vertex AI kernels optimally…

Experiment with "warming up" benchmark script

Problem we observed: the first run is consistently ~30s slower than subsequent runs, even though we're restarting the kernel in between.

Prediction time (s) First run? Machine config
104.48 y n1-highmem-4 + 1x Tesla T4
103.17 y n1-highmem-4 + 1x Tesla T4
74.09 n n1-highmem-4 + 1x Tesla T4
73.41 n n1-highmem-4 + 1x Tesla T4
74.08 n n1-highmem-4 + 1x Tesla T4
78.2 y n1-highmem-4 + 1x Tesla P100-PCIE-16GB
78.75 y n1-highmem-4 + 1x Tesla P100-PCIE-16GB
43.55 n n1-highmem-4 + 1x Tesla P100-PCIE-16GB
44.4 n n1-highmem-4 + 1x Tesla P100-PCIE-16GB
44.17 n n1-highmem-4 + 1x Tesla P100-PCIE-16GB
43.47 n n1-highmem-4 + 1x Tesla P100-PCIE-16GB

MORE OBSERVATIONS:

  • GPU memory goes back to 0% after kernel restart. Probably no caching in GPU memory.
  • All documentation + stackoverflow posts suggest that GPU is cleared after process shutdown (kernel restart).
  • It seems quite consistently faster afterward.

IDEA:

  • Add a dummy data 512x512 prediction in the benchmark notebook, but outside the timed portion.
  • Does this warm up whatever needs to be warmed up, for the main prediction loop?

Create setup notebook

Create a separate setup notebook to install dependencies. This avoids restarting notebooks.

  • create setup.ipynb in root directory, does 1 thing: the pip install from the top of the benchmark file
  • in the benchmark notebook, change the pip install to trying to import deepcell – if that fails, refer the user to the setup notebook

Determine if CPU prediction is affected by warm-up

Relating to #94 we need to determine if CPU prediction is affected by 1st vs subsequent runs.

Task: run the ~230 MB sample through the benchmark, on n1-standard-8 no GPU batch size 16, then restart kernel (NOT make a new instance) & do it again.

Determine what happened to previous mesmer samples

The mesmer samples we extracted used a previous commit's dataset no longer available in the deepcell-tf main branch.

From commit history it looks like the dataset may have been replaced with the tissue_net dataset. The expected hash values don't match, but this could be just the naming inside the .npz file.

Objective of this work: determine the difference between the old commit data (which is still available on s3 as of 2023-11-17 at least) and the newly available tissue net data.

Add cached model download to notebook

The model file is relatively large (100MB). Cache the download to disk to avoid refetching. Also, the notebook doesn't support the model download in the first place 😬

Use this gs uri:
gs://davids-genomics-data-public/cellular-segmentation/deep-cell/vanvalenlab-tf-model-multiplex-downloaded-20230706/MultiplexSegmentation.tgz

Investigate grayreconstruct edge case test_zero_image_one_mask

This test is failing:

    def test_zero_image_one_mask():
        """Test reconstruction with an image of all zeros and a mask that's not"""
        result = reconstruction(np.zeros((10, 10)), np.ones((10, 10)))
>       assert_array_almost_equal(result, 0)
E       AssertionError:
E       Arrays are not almost equal to 6 decimals
E
E       Mismatched elements: 100 / 100 (100%)
E       Max absolute difference: 1.
E       Max relative difference: inf
E        x: array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
E              [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
E              [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],...
E        y: array(0)

test_reconstruction.py:113: AssertionError

I'm surprised to see max abs = 1 and max rel = inf, also it's 100% different so something weird is happening.

Determine way to speed up benchmarks of large files

Larger data testing is tedious because post-processing is Hella Slow™. It's 8min or more for 1.3 GB inputs.

Note that infrastructure doesn’t seem to make a big difference for post-processing time. And GPU is not used at all during this phase (per observation of monitoring charts + knowledge of implementation).

This represents post-processing time broken down by machine type, GPU (or not), and input size. Note that post-processing doesn't vary too much.
Image

It would be really nice if we didn't have to wait for this.

(1) Skip post-processing in benchmarks.

  • Note in benchmark data if post-processing was run.
  • The output is a bit meaningless in terms of correctness.
    (2) Speed up post-processing. #28

Option 1 could be something like: skip the post-processing by passing in no-op function as the postprocessing_fn in constructor to Application object (maybe need to create a subclass to Mesmer class to override constructor).

Add Simple Diagram

Create and add a simple arch diagram which explains our perf-testing work plan for DeepCell on GCP to the top of the README file

Attempt benchmark with small persistent disk

The persistent disk is a relatively small expenditure ($0.14 cents daily, for a forgotten 100 GB persistent disk). We probably don't need to worry too much about this.

Still, it would be nice to know if we're vastly over-provisioned. Let's try running a benchmark with ~10GB persistent disk, or 50GB. Use one of the larger files.

Also consider simply not caring for now, assuming DevOps processes & cost monitoring would find the issue. (Really though?) It's still just a few cents.

Document exact benchmark process

Update the top-level readme, and/or e2e-deepcell benchmark readme, with precise benchmark steps.

Something like:

  • open notebook in specific kernel (but see also #59)
  • select input file, update in notebook
  • select hardware type (clicking in notebook)
  • restart kernel & run all cells
  • copy the csv from the bottom into a sheet (this one?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.