chanzuckerberg / cellxgene Goto Github PK
View Code? Open in Web Editor NEWAn interactive explorer for single-cell transcriptomics data
Home Page: https://chanzuckerberg.github.io/cellxgene/
License: MIT License
An interactive explorer for single-cell transcriptomics data
Home Page: https://chanzuckerberg.github.io/cellxgene/
License: MIT License
design args and behavior
An invisible voronoi overlay painted on the SVG layer can catch mouse events. We can create an overlay using cluster centroids to let people click on clusters. This will help when there are 100 clusters and 20 of them are different shades of green.
Current webgl code runs the GPU constantly if the application is an active tab. This will eat laptop batteries for lunch and generally thrashes my laptop.
We should look at ways to only render when necessary.
The group check-boxes for categorical metadata don't quite work right and can show conflicting states:
Pick any category group, eg., EM2Cluster:
this is confusing -- the group level checkbox can be in either state when some options are selected. If it is only present to provide group-level actions, perhaps it should be replaced by separate [all] and [clear] actions?
There are quite a few places in the code that look like:
const vertices =
state.cells.cells && state.cells.cells.data.graph
? state.cells.cells.data.graph
: null;
This can be improved by using _.get()
. The above code becomes:
const vertices = _.get('state.cells.cells.data.graph', null);
Heroku button or equivalent
For users to share their data once they have it set up locally.
Warning: Accessing createClass via the main React package is deprecated, and will be removed in React v16.0. Use a plain JavaScript class instead. If you're not yet ready to migrate, create-react-class v15.* is available on npm as a temporary, drop-in replacement. For more info see https://fb.me/react-create-class
See updated component lifecycle at https://reactjs.org/docs/react-component.html
or perhaps multiple medium posts:
scanpy
em2
on the t. muris end point, the single gene expression count request is returning a cell that was not present in the original /cells response.
See command line example below. There are 2446 of these "new" cell names returned by a POST to /expression, of which P3.D042103.3_11_M.1 is but one example.
# curl -s -d '{"genelist":["Anxa5"]}' -H "Content-Type: application/json" -X POST http://tabulamuris.cxg.czi.technology/api/v0.1/expression | grep P3.D042103.3_11_M.1
"cellname": "P3.D042103.3_11_M.1",
# curl -s http://tabulamuris.cxg.czi.technology/api/v0.1/cells | grep P3.D042103.3_11_M.1
#
Note: added code to the front-end to defend against this: see src/actions/index.js:cleanupExpressionResponse()
Migrate from a hosted backend to a local one. Users can pip install cellxgene and then point it at their directory locally to launch a webserver.
$ curl http://tabulamuris.cxg.czi.technology/api/v0.1/expression
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1997, in call
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 273, in error_router
return original_handler(e)
File "/usr/local/lib/python3.5/dist-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions<**req.view_args|rule.endpoint>
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 480, in wrapper
resp = resource(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/init.py", line 39, in decorator
return f(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask/views.py", line 84, in view
return self.dispatch_request(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful/init.py", line 595, in dispatch_request
resp = meth(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/flask_restful_swagger_2/swagger.py", line 219, in inner
return f(self, *args, **kwargs)
File "/app/application.py", line 1028, in get
data = parse_exp_data(limit=40, unexpressed_genes=unexpressed_genes)
File "/app/application.py", line 294, in parse_exp_data
expression = get_expression(cells, genes)
File "/app/application.py", line 351, in get_expression
raise error
File "/app/application.py", line 345, in get_expression
expression = e.getDenseExpressionMatrix("AllGenes", cellset)
MemoryError: std::bad_alloc
App is "usable" on a modern laptop (eg, macbook pro) with 1M cells loaded, eg, speed and interactive performance are reasonable. On more typical data sets (eg, 250K cells), speed & interactive performance should be excellent (no jank, etc.)
Cell selection can contain cells that are not in the current "world" if you click regraph while cells are selected
load a data set, scale the graph (in or out), and then use the brush selection. Wrong cells are selected. It appears that the brush selection coordinates are not having the same scaling applied.
A single back-end, running on a modest AWS instance, can handle interactive serving for 10+ users. Larger instances can survive 100+ concurrent users.
parent of #29
When regraphing a second time, cxg needs to send over the original filter + the new filter in order to give the rest api the full subset to filter on. The rest api maintains no memory of previous filters.
Tracking issue for interactive performance improvements in the front-end. Please add any ideas you may have to the issue. These are primarily "big" changes that would require some substantial refactoring.
Starter list:
cells: {
metadata: {
fieldname1: [ ... values of all cells as an array, eg, 'foo', 'bar', 'baz' ],
fieldname2: [ ... same ... ].,
...
}
}
}
During high latency operations (eg, calculation of differential expression), the UI would benefit from a busy status indicator (eg, a 'computing....' footer or some such signal). Given that we render each canvas asynchronously, may need one per canvas (eg, a red dot in the corner when it is being re-rendered)
using pbmc33k data set, computed differential expression where both data sets were all cells. Generated an error.
POST http://pbmc33k.cxg.czi.technology/api/v0.1/expression 400 (BAD REQUEST)
d3.js:127 Uncaught (in promise) TypeError: Cannot read property 'length' of undefined
at Object.extent (d3.js:127)
at t.maybeSetupScalesAndDrawAxes ((index):1)
at t.componentWillReceiveProps ((index):1)
at c.updateComponent ((index):1)
at c.receiveComponent ((index):1)
at Object.receiveComponent ((index):1)
at c._updateRenderedComponent ((index):1)
at c._performComponentUpdate ((index):1)
at c.updateComponent ((index):1)
at c.performUpdateIfNecessary ((index):1)
at Object.performUpdateIfNecessary ((index):1)
at a ((index):1)
at r.perform ((index):1)
at o.perform ((index):1)
at o.perform ((index):1)
at Object.w [as flushBatchedUpdates] ((index):1)
at r.closeAll ((index):1)
at r.perform ((index):1)
at Object.batchedUpdates ((index):1)
at Object.e [as enqueueUpdate] ((index):1)
at r ((index):1)
at Object.enqueueSetState ((index):1)
at i.r.setState ((index):1)
at i.onStateChange ((index):1)
at Object.notify ((index):1)
at e.notifyNestedSubs ((index):1)
at i.onStateChange ((index):1)
at p ((index):1)
at (index):1
at (index):1
at (index):1
at (index):1
at dispatch ((index):1)
at (index):1
at <anonymous>
This is a tracking thread for ideas/debate about improvements to the cellxgene REST API. Please add your own thoughts and/or reactions.
Starter list of issues/ideas:
Semantics of /cells
response object:
/cells
OTA performance:
/cells
transmits each ID three times)/initialize
:
schema
field in /initialize
. Ie, what are the types, what are their characteristics, etc.schema
sub-object should only contain data model info (include
is a UI hint, and should be not be in the the data schema).naming is inconsistent, eg, /cells uses CellName
, and /expression uses cellname
it would be very helpful if all endpoints, when recieving an unqualified request, returned cells
in the same order and with the same dimensionality, so that cellname / cellid doesn't have to be
explicitly used to link the two (position in the array is sufficient)
For regraphing, there is a lot of overhead in using /cells, as it returns all metadata as well as the new graph. We should have /metadata, and /graph, and get rid of /cells. If we need the grouping information (options
), that could be a separate endpoint as well. Much more flexible for the front-end.
Note on API chattiness (OTA bandwidth): this will only be an issue when the back-ends are much faster. Currently, most of the "download" time is actually waiting for the back-end to respond (time-to-first-byte). This is true for both the EM2 and ScanPy back-ends.
Load the T. Muris endpoint. Select a set of cells (a few thousand) and put into Selection 1. Select a large number of cells (25,000 or larger - eg, most of the cells) and put in Selection 2.
Click Compute differential expression
Results in a 500 HTTP response from the back-end, on the /diffexpression endpoint. Seems to be triggered by very large number of cells in the selection.
ignore!
Create single cell view
src/components/graph/graph.js: the Graph
component connects to more state than it uses. When the component stabilizes, it would be good to clean this up.
Examples: ranges
To reproduce:
At this point, the expression tab shows up, missing the scatterplot. It just doesn't get re-rendered.
To reproduce:
GET http://tabulamuris.cxg.czi.technology/api/v0.1/cells?EM2Cluster=0&EM2Cluster=2&EM2Cluster=3&EM2Cluster=4&EM2Cluster=5&EM2Cluster=6&EM2Cluster=7&EM2Cluster=8&EM2Cluster=9&EM2Cluster=10&EM2Cluster=11&EM2Cluster=12&EM2Cluster=13&EM2Cluster=14&EM2Cluster=15&EM2Cluster=16&EM2Cluster=17&EM2Cluster=18&EM2Cluster=19&EM2Cluster=20&EM2Cluster=21&EM2Cluster=22&EM2Cluster=23&EM2Cluster=24&EM2Cluster=25&EM2Cluster=26&EM2Cluster=27&EM2Cluster=28&EM2Cluster=29&EM2Cluster=30&EM2Cluster=31&EM2Cluster=32&EM2Cluster=33&EM2Cluster=34&EM2Cluster=35&EM2Cluster=36&EM2Cluster=37&EM2Cluster=38&EM2Cluster=39&EM2Cluster=40&EM2Cluster=41&EM2Cluster=42&EM2Cluster=43&EM2Cluster=44&EM2Cluster=45&EM2Cluster=46&EM2Cluster=47&EM2Cluster=48&EM2Cluster=49&EM2Cluster=50&EM2Cluster=51&EM2Cluster=52&EM2Cluster=53&EM2Cluster=54&EM2Cluster=55&EM2Cluster=56&EM2Cluster=57&EM2Cluster=58&EM2Cluster=59&EM2Cluster=60&EM2Cluster=61&EM2Cluster=62&EM2Cluster=63&EM2Cluster=64&EM2Cluster=65&EM2Cluster=66&EM2Cluster=67&EM2Cluster=68&EM2Cluster=69&EM2Cluster=70&EM2Cluster=71&EM2Cluster=72&EM2Cluster=73&EM2Cluster=74&EM2Cluster=75&EM2Cluster=76&EM2Cluster=77&EM2Cluster=78&EM2Cluster=79&EM2Cluster=80&EM2Cluster=81&EM2Cluster=82&EM2Cluster=83&EM2Cluster=84&EM2Cluster=85&EM2Cluster=86&EM2Cluster=87&EM2Cluster=88&EM2Cluster=89&EM2Cluster=90&EM2Cluster=91&EM2Cluster=92&EM2Cluster=93&EM2Cluster=94&EM2Cluster=95&EM2Cluster=96&EM2Cluster=97&EM2Cluster=98&EM2Cluster=99&EM2Cluster=100&EM2Cluster=101&EM2Cluster=NoCluster
500 (INTERNAL SERVER ERROR)
Rather than posting a list of CellNames, I accidentally posted the stuff below, and the server crashed (returned a 502). The example below has been edited to keep it short.
{"celllist1":[
{"CellName":"AAAGAGACGGCATT","EM2Cluster":"0","cluster_id":"1","tSNE_1":"-3.52069103297318","tSNE_2":"-3.46894020526272","__cellIndex__":15,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.4337592343704319,"__y__":0.43125663054081603},
{"CellName":"AAATCAACCCTATT","EM2Cluster":"1","cluster_id":"5","tSNE_1":"-2.69271065101789","tSNE_2":"1.95420307068623","__cellIndex__":26,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.444891742821794,"__y__":0.5189382985520843},
{"CellName":"AAATCCCTCCACAA","EM2Cluster":"0","cluster_id":"1","tSNE_1":"0.840431446555436","tSNE_2":"-0.0234178421436222","__cellIndex__":30,"__selected__":true,"__color__":"rgba(0,0,0,1)","__colorRGB__":[0,0,0],"__x__":0.49239617060571467,"__y__":0.48696401878339307},
... ],
"celllist2": [ ... ],
num_genes: 7}
if you select a region in the cluster graph, it requires a double-left-click to clear it. IMHO, it should be single click.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.