Coder Social home page Coder Social logo

chanzuckerberg / cellxgene Goto Github PK

View Code? Open in Web Editor NEW
584.0 584.0 111.0 244.7 MB

An interactive explorer for single-cell transcriptomics data

Home Page: https://chanzuckerberg.github.io/cellxgene/

License: MIT License

JavaScript 68.39% HTML 0.37% CSS 0.24% Python 30.06% Dockerfile 0.02% Makefile 0.72% Shell 0.10% AppleScript 0.10%
dataviz scientific scrna-seq transcriptomics visualization

cellxgene's People

Contributors

ambrosejcarr avatar ashin-czi avatar atarashansky avatar atolopko-czi avatar bento007 avatar bkmartinjr avatar blrnw3 avatar colinmegill avatar csweaver avatar dependabot[bot] avatar ebezzi avatar fionagriffin avatar freeman-lab avatar ihnorton avatar jakeyheath avatar lesliecodes avatar maniarathi avatar mattcai avatar mckinsel avatar mdunitz avatar millenniumfalconmechanic avatar mweiden avatar neuromusic avatar prete avatar roaga avatar seve avatar sidneymbell avatar signechambers1 avatar snyk-bot avatar tihuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cellxgene's Issues

Voronoi cluster selection

An invisible voronoi overlay painted on the SVG layer can catch mouse events. We can create an overlay using cluster centroids to let people click on clusters. This will help when there are 100 clusters and 20 of them are different shades of green.

https://bl.ocks.org/mbostock/8033015

image

excessive GPU utilization

Current webgl code runs the GPU constantly if the application is an active tab. This will eat laptop batteries for lunch and generally thrashes my laptop.

We should look at ways to only render when necessary.

categorical group check boxes state

The group check-boxes for categorical metadata don't quite work right and can show conflicting states:

Pick any category group, eg., EM2Cluster:

  1. test 1 - deselect the group, and then select a few of the options --> the group-level checkbox is unselected
  2. test 2 - select the group, then deselect a few options --> the group-level checkbox is selected

this is confusing -- the group level checkbox can be in either state when some options are selected. If it is only present to provide group-level actions, perhaps it should be replaced by separate [all] and [clear] actions?

use _.get()

There are quite a few places in the code that look like:

  const vertices =
    state.cells.cells && state.cells.cells.data.graph
      ? state.cells.cells.data.graph
      : null;

This can be improved by using _.get(). The above code becomes:

  const vertices = _.get('state.cells.cells.data.graph', null);

Medium post on cellxgene

or perhaps multiple medium posts:

  • general announcement
  • technical design: architectural & technological approaches/patterns

gene expression count returning unexpected cell name

on the t. muris end point, the single gene expression count request is returning a cell that was not present in the original /cells response.

See command line example below. There are 2446 of these "new" cell names returned by a POST to /expression, of which P3.D042103.3_11_M.1 is but one example.

# curl -s -d '{"genelist":["Anxa5"]}'  -H "Content-Type: application/json" -X POST http://tabulamuris.cxg.czi.technology/api/v0.1/expression | grep P3.D042103.3_11_M.1
        "cellname": "P3.D042103.3_11_M.1",
# curl -s http://tabulamuris.cxg.czi.technology/api/v0.1/cells | grep P3.D042103.3_11_M.1
#

Note: added code to the front-end to defend against this: see src/actions/index.js:cleanupExpressionResponse()

Local Client: Tracking Issue

Migrate from a hosted backend to a local one. Users can pip install cellxgene and then point it at their directory locally to launch a webserver.

Memory error on GET

$ curl http://tabulamuris.cxg.czi.technology/api/v0.1/expression

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1997, in call
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions<**req.view_args|rule.endpoint>
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 480, in wrapper
resp = resource(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/init.py", line 39, in decorator
return f(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask/views.py", line 84, in view
return self.dispatch_request(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 595, in dispatch_request
resp = meth(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/swagger.py", line 219, in inner
return f(self, *args, **kwargs)
File "/app/application.py", line 1028, in get
data = parse_exp_data(limit=40, unexpressed_genes=unexpressed_genes)
File "/app/application.py", line 294, in parse_exp_data
expression = get_expression(cells, genes)
File "/app/application.py", line 351, in get_expression
raise error
File "/app/application.py", line 345, in get_expression
expression = e.getDenseExpressionMatrix("AllGenes", cellset)
MemoryError: std::bad_alloc

One Million Cells: Tracking Issue

App is "usable" on a modern laptop (eg, macbook pro) with 1M cells loaded, eg, speed and interactive performance are reasonable. On more typical data sets (eg, 250K cells), speed & interactive performance should be excellent (no jank, etc.)

Backend Concurrency: Tracking Issue

A single back-end, running on a modest AWS instance, can handle interactive serving for 10+ users. Larger instances can survive 100+ concurrent users.

Regraph needs full filter

When regraphing a second time, cxg needs to send over the original filter + the new filter in order to give the rest api the full subset to filter on. The rest api maintains no memory of previous filters.

front-end interactive performance improvement ideas - tracking issue

Tracking issue for interactive performance improvements in the front-end. Please add any ideas you may have to the issue. These are primarily "big" changes that would require some substantial refactoring.

Starter list:

  • we could ~ halve the memory used by dense metadata storage if it was organized so that the metadata field names were stored only once (right now they are stored for each cell). Sparse metadata would need to continue to be stored on a per-cell basis. Eg,
   cells: {
      metadata: {
           fieldname1: [ ... values of all cells as an array, eg, 'foo', 'bar', 'baz' ],
           fieldname2: [ ... same ... ].,
           ...
      }
   }
}
  • REST API is verbose and not tightly encoded, and download time will be a problem as data set sizes grow. There appear to be some low hanging improvements we could make.
  • Most of the selection/deselection computation on categorical metadata boils down to set operations. Are there improvements if we start modelling the selection set as a collection, rather than with annotations on the individual cells?

status indicator needed during costly operations

During high latency operations (eg, calculation of differential expression), the UI would benefit from a busy status indicator (eg, a 'computing....' footer or some such signal). Given that we render each canvas asynchronously, may need one per canvas (eg, a red dot in the corner when it is being re-rendered)

splat when doing differential expression compute on large data set

using pbmc33k data set, computed differential expression where both data sets were all cells. Generated an error.

POST http://pbmc33k.cxg.czi.technology/api/v0.1/expression 400 (BAD REQUEST)

d3.js:127 Uncaught (in promise) TypeError: Cannot read property 'length' of undefined
    at Object.extent (d3.js:127)
    at t.maybeSetupScalesAndDrawAxes ((index):1)
    at t.componentWillReceiveProps ((index):1)
    at c.updateComponent ((index):1)
    at c.receiveComponent ((index):1)
    at Object.receiveComponent ((index):1)
    at c._updateRenderedComponent ((index):1)
    at c._performComponentUpdate ((index):1)
    at c.updateComponent ((index):1)
    at c.performUpdateIfNecessary ((index):1)
    at Object.performUpdateIfNecessary ((index):1)
    at a ((index):1)
    at r.perform ((index):1)
    at o.perform ((index):1)
    at o.perform ((index):1)
    at Object.w [as flushBatchedUpdates] ((index):1)
    at r.closeAll ((index):1)
    at r.perform ((index):1)
    at Object.batchedUpdates ((index):1)
    at Object.e [as enqueueUpdate] ((index):1)
    at r ((index):1)
    at Object.enqueueSetState ((index):1)
    at i.r.setState ((index):1)
    at i.onStateChange ((index):1)
    at Object.notify ((index):1)
    at e.notifyNestedSubs ((index):1)
    at i.onStateChange ((index):1)
    at p ((index):1)
    at (index):1
    at (index):1
    at (index):1
    at (index):1
    at dispatch ((index):1)
    at (index):1
    at <anonymous>

REST API issues / potential areas of improvement - tracking issue

This is a tracking thread for ideas/debate about improvements to the cellxgene REST API. Please add your own thoughts and/or reactions.

Starter list of issues/ideas:

  • Semantics of /cells response object:

    1. API mixes together external (submitter) cell ID and the API-internal data structure cell id. These should be separate concepts. We should separate these concepts, and use a much more compact "ID" for the API internals.
    2. cellids are encoded into external metadata (with name 'CellName' - this pollutes the namespace)
  • /cells OTA performance:

    1. don't transmit cellids more than once per cell (currently /cells transmits each ID three times)
    2. API internal (response) data structure should use much small encoding for a cellid (eg, a number, or make it implicit)
    3. Current metadata encoding works best for sparse metadata, but wastes a lot of bandwidth if metadata is not sparse (the metadata field names are re-transmitted for each cell). We could cut response size dramatically by using different data structures for sparse and dense metadata.
  • /initialize:

    1. better documentation of the schema model used for schema field in /initialize. Ie, what are the types, what are their characteristics, etc.
    2. schema sub-object should only contain data model info (include is a UI hint, and should be not be in the the data schema).
  • naming is inconsistent, eg, /cells uses CellName, and /expression uses cellname

  • it would be very helpful if all endpoints, when recieving an unqualified request, returned cells
    in the same order and with the same dimensionality, so that cellname / cellid doesn't have to be
    explicitly used to link the two (position in the array is sufficient)

  • For regraphing, there is a lot of overhead in using /cells, as it returns all metadata as well as the new graph. We should have /metadata, and /graph, and get rid of /cells. If we need the grouping information (options), that could be a separate endpoint as well. Much more flexible for the front-end.

Note on API chattiness (OTA bandwidth): this will only be an issue when the back-ends are much faster. Currently, most of the "download" time is actually waiting for the back-end to respond (time-to-first-byte). This is true for both the EM2 and ScanPy back-ends.

500 error from REST API

Load the T. Muris endpoint. Select a set of cells (a few thousand) and put into Selection 1. Select a large number of cells (25,000 or larger - eg, most of the cells) and put in Selection 2.

Click Compute differential expression

Results in a 500 HTTP response from the back-end, on the /diffexpression endpoint. Seems to be triggered by very large number of cells in the selection.

graph.js: excessive state bound

src/components/graph/graph.js: the Graph component connects to more state than it uses. When the component stabilizes, it would be good to clean this up.

Examples: ranges

diff expression scatter plot disappears

To reproduce:

  1. select two cell sets and compute differential expression
  2. click on Expression tab, and select an X and Y gene.
    -- you should now see diffexp scatterplot --
  3. Click Metadata tab
  4. Click Expression tab

At this point, the expression tab shows up, missing the scatterplot. It just doesn't get re-rendered.

server 500 error when regraphing

To reproduce:

  1. load http://tabulamuris.cxg.czi.technology/
  2. Open the categorical metadata selector for EM2Cluster
  3. Deselect category option 1
  4. Click regraph button
  5. console shows 500 error from server
GET http://tabulamuris.cxg.czi.technology/api/v0.1/cells?EM2Cluster=0&EM2Cluster=2&EM2Cluster=3&EM2Cluster=4&EM2Cluster=5&EM2Cluster=6&EM2Cluster=7&EM2Cluster=8&EM2Cluster=9&EM2Cluster=10&EM2Cluster=11&EM2Cluster=12&EM2Cluster=13&EM2Cluster=14&EM2Cluster=15&EM2Cluster=16&EM2Cluster=17&EM2Cluster=18&EM2Cluster=19&EM2Cluster=20&EM2Cluster=21&EM2Cluster=22&EM2Cluster=23&EM2Cluster=24&EM2Cluster=25&EM2Cluster=26&EM2Cluster=27&EM2Cluster=28&EM2Cluster=29&EM2Cluster=30&EM2Cluster=31&EM2Cluster=32&EM2Cluster=33&EM2Cluster=34&EM2Cluster=35&EM2Cluster=36&EM2Cluster=37&EM2Cluster=38&EM2Cluster=39&EM2Cluster=40&EM2Cluster=41&EM2Cluster=42&EM2Cluster=43&EM2Cluster=44&EM2Cluster=45&EM2Cluster=46&EM2Cluster=47&EM2Cluster=48&EM2Cluster=49&EM2Cluster=50&EM2Cluster=51&EM2Cluster=52&EM2Cluster=53&EM2Cluster=54&EM2Cluster=55&EM2Cluster=56&EM2Cluster=57&EM2Cluster=58&EM2Cluster=59&EM2Cluster=60&EM2Cluster=61&EM2Cluster=62&EM2Cluster=63&EM2Cluster=64&EM2Cluster=65&EM2Cluster=66&EM2Cluster=67&EM2Cluster=68&EM2Cluster=69&EM2Cluster=70&EM2Cluster=71&EM2Cluster=72&EM2Cluster=73&EM2Cluster=74&EM2Cluster=75&EM2Cluster=76&EM2Cluster=77&EM2Cluster=78&EM2Cluster=79&EM2Cluster=80&EM2Cluster=81&EM2Cluster=82&EM2Cluster=83&EM2Cluster=84&EM2Cluster=85&EM2Cluster=86&EM2Cluster=87&EM2Cluster=88&EM2Cluster=89&EM2Cluster=90&EM2Cluster=91&EM2Cluster=92&EM2Cluster=93&EM2Cluster=94&EM2Cluster=95&EM2Cluster=96&EM2Cluster=97&EM2Cluster=98&EM2Cluster=99&EM2Cluster=100&EM2Cluster=101&EM2Cluster=NoCluster 
500 (INTERNAL SERVER ERROR) 

posting junk to /diffexpression causes crash in server

Rather than posting a list of CellNames, I accidentally posted the stuff below, and the server crashed (returned a 502). The example below has been edited to keep it short.

{"celllist1":[
{"CellName":"AAAGAGACGGCATT","EM2Cluster":"0","cluster_id":"1","tSNE_1":"-3.52069103297318","tSNE_2":"-3.46894020526272","__cellIndex__":15,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.4337592343704319,"__y__":0.43125663054081603},
{"CellName":"AAATCAACCCTATT","EM2Cluster":"1","cluster_id":"5","tSNE_1":"-2.69271065101789","tSNE_2":"1.95420307068623","__cellIndex__":26,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.444891742821794,"__y__":0.5189382985520843},
{"CellName":"AAATCCCTCCACAA","EM2Cluster":"0","cluster_id":"1","tSNE_1":"0.840431446555436","tSNE_2":"-0.0234178421436222","__cellIndex__":30,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.49239617060571467,"__y__":0.48696401878339307}, 
... ], 
"celllist2": [ ... ], 
num_genes: 7}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.