graphistry / pygraphistry Goto Github PK

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

License: BSD 3-Clause "New" or "Revised" License

Python 98.84% Shell 0.90% Dockerfile 0.26%

graph visualization gpu graphistry python rapids cugraph networkx neo4j tigergraph

pygraphistry's People

Stargazers

Watchers

Forkers

kcompher tdhopper deathbearbrown anukat2015 cuulee sreenadhp baleezo locdao jz3707 yamh mrgoogol arunnairid mayblue9 jameslinus dmartr devd maurojr jemisa bssrdf faisal-w nandankumar-drg vdt mewbak codeaudit keremvatandas number0 sunguwei jasimmk vreuter wrightrocket laserson neuroradiology chefren tchen0123 repo-fork subokita maniacs-oss biztrology-kd benjamesbabala mickeyto labes1 piggyjam seth1002 cclauss yibit giserh intellifora llpj bjtman uraboer yxnchen monstott fengweijp foolfish1 micahstubbs songofhack sreearikatla barseghyanartur nudtchengqing daniel-gallagher shubhampachori12110095 santhoshlakkaraju lazycrazyowl javagner wangguojie metavi sjanulonoks johnymontana fabiodr alanvalejo neo4reo afcarl gzjas cwharris srini-hdp dfirgeek mustikarizki beyondcy nwilliam868 reso100 xidianlq vpatrikis 4n6strider ashish7129 kirtijain25 jjemio pawoody tiatam010 gitcontainer kanishkkaran mitchaiet bpraggastis jonherke shalevy1 fabridamicelli nuhabit-ai p59082644 wkryst kaustubhagarwal18 maximsid11

pygraphistry's Issues

clearer error message when no node/source/destination are provided

list class methods in documentation's table of contents

Unassigned -- @thibaudh and I are puzzled.

reset_index on input frames

If the named columns are indices, they get missed. Treatable by running df = df.reset_index() on input frames.

Resizable cell

(Meta: trying to do via a more structured spec process)

What & Why

Problem: The notebook exploration cell is generally too small for exploration, and the current process of opening the visualization in a new tab is slow and loses in-notebook state.

High-Level Proposal: Change the behavior of the 'pop out' button to cause the iframe to take up most of the width, height of the parent iframe. It should not disrupt the scroll state, nor scrollability, of the parent notebook/frame.

Tasks

☐ Change icon: pop out -> maximize
☐ postMessage protocol, communication: Manually implement parent/child frame communication. We expect the postmessage protocol to change as part of the graphistryjs interactive embedding api advances, but the usability benefit here outweighs waiting.
☐ Resize logic: basic intent is to take most of the screen while still allowing scrolling between cells. So, the frame should be anchored on the same absolute Y position. To allow scrolling, the width will be screen - 100px, and centered. We may need to play some tricks on the DOM and CSS to make it do that.
☐ Minimize logic: the parent frame will ack expansion to the child frame, at which point the maximize icon switches to minimize, and click it restores the frame size

Out of Scope

GraphistryJS-based implementation

Who

Implementation: @thibaudh @lmeyerov
Design feedback: @padentomasello
Implementation feedback: @briantrice @quinnhj @trxcllnt

Questions

If the user really does want a new tab, how should they now achieve that? Keep the pop-out button, or some other way? One thought already: may be nice to expand the control options on the static loader screen.
Should the resizing be an animated tween or immediate?

Warning when uploading large graphs

@thibaudh @briantrice

We have two interesting inflection points for graph size:

~1MM nodes + edges: client quality degrades
~8MM nodes or ~8MM edges: client hit testing becomes undefined

I'm wondering when IPython should warn vs. error . The second is definitely an error for now -- should the first also be?

SSL proxy for prod

( + @thibaudh )

Matt, we need to get the https proxy up as part of the API 1.0 release. This will, in turn, trigger so code changes. Can you let me know when that happens?

(HTTPS proxy should be up for both prod & staging. They currently use diff code paths, and I'd like to unify.)

unbound nodes?

Getting an error on the below:

    .edges(base).nodes(nodes).plot()

ERROR: Node identifier must be bound when using node dataframe.```

Way to pass in filters

( + @thibaudh + @briantrice )

This will have to wait for the API to support it first (...), but I'd like to write something like

g
   .data(nodes=..., edges=...)
   .bind(...)
   .filter('correlation > 10 & degree > 7')
   .plot()

The key is that all data should be passed into the viz so I can interactively change the filters later. (If I didn't want that behavior, I could filter on the data argument.)

Networkx / Python3 encoding issue

From https://gist.github.com/ResidentMario/87c282ea4ebded91ee31 :

import networkx as nx
import graphistry
graphistry.register(key='...[key obfuscated]...')
graph = nx.path_graph(4)
graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)```

Causes

Traceback (most recent call last):
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\IPython\core\interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-5eadd70251a7>", line 1, in <module>
    graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\graphistry\plotter.py", line 311, in plot
    info = PyG._etl1(dataset)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\graphistry\pygraphistry.py", line 185, in _etl1
    headers=headers, params=params)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 296, in prepare
    self.prepare_body(data, files, json)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 447, in prepare_body
    body = self._encode_params(data)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 84, in _encode_params
    return to_native_string(data)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\utils.py", line 700, in to_native_string
    out = string.decode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
Raw

When run in Python2, a graph successfully generates. I do not have a working Python3 build right now, so could not duplicate there. But, assuming the error duplicates, narrows down the scope.

Keep column names instead of renaming to Source and Destination.

While doing bipartite graphs of IP address -> Alert name for example, it'd be nice to keep the column names so you can refer to them in histograms and filters. (I found myself confused for a few seconds when I didn't see alert names in the histograms).

Food for thought: this will get more complicated as we move past bipartite. I imagine wanting to have a histogram for each type of node. For bipartite graphs, we have this functionality because we have the concept of source and destination.

@thibaudh: I'm assigning myself, since I thought this might be a nice project to start with this code base. Maybe we can chat to see if this is appropriate (and something we even want to do) and what the right way to do this is.

Extend color palette so that cluster > 12 are not all black

control url & iframe style settings

something like graphistry.settings(urlOpts={'scene': 'uber', ...}, style={'border-color': 'green', ...})

"See palette definitions" link broken in docs

http://pygraphistry.readthedocs.org/en/latest/graphistry.html has several links to https://github.com/graphistry/pygraphistry/blob/master/graphistry.com/palette.html which doesn't exist.

Expose external hook for embedders

@thibaudh @briantrice

I'm trying to externally control a generated graphistry iframe, and so it'd be really dandy to work with it. A few options:

-- allow specifying HTML attributes (id, name, class); I think we only expose style and url_params
-- provide default class and gensym id
-- wrap into whatever widget system IPython and others are using and match that interface

Fix dependencies of pip package

We should probably depend on Pandas since nobody is going to use the direct json API.

Bind call required before most others

I expected pygraphistry.settings(...) and pygraphistry.edges(..) to do something, but it looks like they are only callable as pygraphistry.bind(..).settings(..) and pygraphistry.bind(..).edges(..).

Make an anaconda package

Stronger warning when dataset if too big

edge colors can't be set

get errors when I do edge_color = ...

Support Python 3

make def height 600 to prevent modal window sizing issues

Cannot bind nodes/edges in Plotter

pygraphistry.bind(...).edges(..) fails because there's both a field edges and method edges.

Suggestion 1: make the fields pygraphistry.bindings.edges.
Suggestion 2: make the methods return self on non-undefined set, and and return the binding when no value is passed in.

Key check

Whenever setting the key, check against /api/ for validity.

Hint to set notebook to Trusted

( @thibaudh : can you take, or should I?)

When opening a third-party notebook, our viz won't be shown because our JS won't run by default.

I propose either:

A) Print out warning/hint HTML to do File -> Trusted Notebook and then have JS delete that warning

B) Load an iframe URL and then have our existing iframe js logic overwrite it.

Leaning towards A due to embedding issues motivating the JS logic.

Python keywords cannot be used as fields of edges.

When I try to create a graph of the following single row dataframe (converted to dictionary):

{'count(1)': {19834: 2}, 'destinationAddress': {19834: u'63.240.185.216'}, 'destinationUserName': {19834: u'constructor'}, 'min(startTime)': {19834: u'19 Oct 2015 23:37:32 EDT'}, 'name': {19834: u'An account failed to log on.'}, 'sourceAddress': {19834: None}, 'timeUniform': {19834: 19686.0}}

I receive the following error:

ValueError Traceback (most recent call last)
in ()
1 plotter = graphistry.bind(source='destinationUserName', destination='destinationAddress')
----> 2 plotter.plot(hsldf[19834:19835])

/home/DCSAC/g.paden/.local/lib/python2.7/site-packages/graphistry/plotter.pyc in plot(self, graph, nodes)
303
304 PyG = pygraphistry.PyGraphistry
--> 305 dataset = PyG._etl(dataset)
306 viz_url = PyG._viz_url(dataset['name'], dataset['viztoken'], self._url_params)
307

/home/DCSAC/g.paden/.local/lib/python2.7/site-packages/graphistry/pygraphistry.pyc in _etl(dataset)
160 jres = response.json()
161 if jres['success'] is not True:
--> 162 raise ValueError('Server reported error:', jres['msg'])
163 else:
164 return {'name': jres['dataset'], 'viztoken': jres['viztoken']}

ValueError: ('Server reported error:', u'Illegal value for Message.Field .VectorGraph.edges: .VectorGraph.Edge (Error: Illegal value for Message.Field .VectorGraph.Edge.src of type uint32: function (not an integer))')

I believe it's because the destinationUserName is 'constructor' which I think is interpreted as a function and not a string. The binding are created using

'plotter = graphistry.bind(source='destinationUserName', destination='destinationAddress')'

FAQ / guide on exporting

cc @thibaudh @padentomasello @briantrice

Replace NaNs with nulls since node cannot parse JSON with NaNs

igraph conversion misses binding

In cell 15 of rawdata/forwardjs/twitter/ForwardJS%20Community%20Analysis.ipynb#

ig = g.bind(source='friend', destination='speaker').pandas2igraph(edges)
clusters = ig.community_infomap()
num_clusters = len(list(set(clusters.membership)))
print('#clusters', str(len(list(set(clusters.membership)))))
clusters.membership[:10]

As is, it works, returning many clusters.

However, if I remove .bind(...), everything gets put into one cluster, suggesting it didn't remember the bindings from earlier calls.

Report server-side ETL errors to client

Generally, add meaningful error messages when API is misused (missing edges, nodes, source, etc)

Also ask user to upgrade pygraphistry when API version has changed

highlights within notebooks often swallowed

Hard to reproduce (more common on @padentomasello )

Likelihood: JS-level, not CSS, as label hovers work.

Guess of fix: set z-index of pygraphistry's generated iframe.

Document API with pydoc

Protobuf 3 (required for Python 3) does not seem to accept Numpy types.

Currently converting to Python numeric types, potentially loosing range/precision

Graphs with spaces in names won't render

Given a call to plot() like the following:

graph.plot(edges, nodes, name="somename")

The above code works and renders a graph in Jupyter Notebook. However, the following does not:

graph.plot(edges, nodes, name="some name")

It just hangs at "Herding stray GPUs" forever. The only difference is the space in the graph name.

Don't columns to rename them, use `bindings` to refer to them instead

settings does not return graphistry?

use case: graphistry.settings('staging').settings(height=600)

(+ @thibaudh )

Set API key based on local user profile, if available

IPython already has a notion of user built in, so rather than each user baking in their API key (and the setting getting confused when notebooks get shared), dynamically lookup the API key based on who is logged into IPython.

This has been becoming a bit of an issue in practice in team settings.

( + @thibaudh @padentomasello @briantrice )

encoding issues

See notebook etl/BestbuySignals.ipynb at end (trying to merge in community detection):

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-33-2dbdcded78d1> in <module>()
----> 1 g.plot(graph=subset, nodes=nodes.merge(nodeid, on='id').drop_duplicates('id'))

/usr/local/lib/python2.7/site-packages/graphistry/plotter.pyc in plot(self, graph, nodes)
     93             util.error('Expected Pandas dataframe or Igraph graph.')
     94 
---> 95         json_dataset = json.dumps(dataset, ensure_ascii=False).encode('utf8')
     96         dataset_name = pygraphistry.PyGraphistry._etl(json_dataset)
     97         viz_url = pygraphistry.PyGraphistry._viz_url(dataset_name, self.url_params)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 10132878: ordinal not in range(128)

Also, unclear why I had to add the drop_duplicates() when going pandas->igraph->pandas.

Send libname and version to avoid API breakage with server

document url_params and height settings

parameter graph is confusing

graphistry.plot(graph=..., nodes=...) confuses me.

A bit of bike shedding, but, to me,

G = (V, E)

So, I expect parameters nodes/vertices to be V, edges to be E, and if there's a G, to be combined (V, E). For example, maybe igraph/networkx objects are a G.

When ETLing without key, error message should state that key is needed

Right now we get: ValueError: No JSON object could be decoded

Default src/dst/node names for igraph/networkx

Low priority, but it'd be convenient if we can do graphistry.plot(nx.lobster(..))) without needing any name binding.

how to unbind an attribute?

(+ @thibaudh @padentomasello )

I bound point_size for one graph, and wanted to unbind for another, and couldn't: setting g.bind(point_size=None) is ignored. I don't think this is the expected behavior..

register() errors are silent

When register(server='non-existent-ip'),failure is silent. Expected either a printed warning or an exception thrown.

ERROR. standard_library.install_aliases()

Hi, I got an error when import graphistry

import graphistry
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-ace3a50e6850> in <module>()
----> 1 import graphistry

build/bdist.linux-x86_64/egg/graphistry/__init__.py in <module>()

build/bdist.linux-x86_64/egg/graphistry/pygraphistry.py in <module>()

AttributeError: 'module' object has no attribute 'install_aliases'

How can I fix this?

better error when NaN in src/dst/node

Make the default plotted graph exportable by default

@thibaudh @briantrice @padentomasello @int3h

When plotting, we should make the default 'more friendly' for exporting as HTML. For example, imagine laying out for a few seconds & auto-saving, and the next time the same URL is loaded, start "offline" from that point. Likewise, an in-visualization save (new camera angle, added filters, ...) would update that visualization. The result is that, when I'm "done" with a notebook, I can just hit "export as html" without having to do anything special and it'll "just work".

Does this seem feasible, ideas on ~workarounds, ...? We may need StreamGL to cooperate a bit and be a bit smarter, but similar to the auto-splashscreen & play, seems doable.

print out current set of bindings

Now that the bindings are not explictly set at each call, it'd help to be able to printf the set of bindings active at a particular point.

Proposal 1:

graphistry.bind(...) : setter that returns graphistry with updated bindings
graphistry.bind() : getter that returns {<k> : <v>}

Proposal 2:
A top-level field bindings that is the dictionary of current bindings.

graphistry.bindings : returns {<k>: <v>}

I'm leaning towards Proposal 2.

Visualizing DNA variation graphs

@lmeyerov

vg is a system for working with sequence graphs that represent populations of genomes. There is wide support for this idea in genomics, and it is on the cusp of use in production contexts that are not well served by exisiting approaches based around a single reference genome sequence (such as the human MHC or in species with high diversity, like mosquitoes).

I have developed techniques to visualize variation graphs but these rely on graph sunsetting operations to visualize larger graphs and eventually meaningful examination if an entire graph breaks down. I'd be interested in seeing what graphistry is doing to handle this kind of use!