Coder Social home page Coder Social logo

graphistry / pygraphistry Goto Github PK

View Code? Open in Web Editor NEW
2.1K 50.0 206.0 26.39 MB

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

License: BSD 3-Clause "New" or "Revised" License

Python 98.84% Shell 0.90% Dockerfile 0.26%
graph visualization gpu graphistry python rapids cugraph networkx neo4j tigergraph

pygraphistry's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pygraphistry's Issues

reset_index on input frames

If the named columns are indices, they get missed. Treatable by running df = df.reset_index() on input frames.

Resizable cell

(Meta: trying to do via a more structured spec process)

What & Why

Problem: The notebook exploration cell is generally too small for exploration, and the current process of opening the visualization in a new tab is slow and loses in-notebook state.

High-Level Proposal: Change the behavior of the 'pop out' button to cause the iframe to take up most of the width, height of the parent iframe. It should not disrupt the scroll state, nor scrollability, of the parent notebook/frame.

Tasks

  • Change icon: pop out -> maximize
  • postMessage protocol, communication: Manually implement parent/child frame communication. We expect the postmessage protocol to change as part of the graphistryjs interactive embedding api advances, but the usability benefit here outweighs waiting.
  • Resize logic: basic intent is to take most of the screen while still allowing scrolling between cells. So, the frame should be anchored on the same absolute Y position. To allow scrolling, the width will be screen - 100px, and centered. We may need to play some tricks on the DOM and CSS to make it do that.
  • Minimize logic: the parent frame will ack expansion to the child frame, at which point the maximize icon switches to minimize, and click it restores the frame size

Out of Scope

  • GraphistryJS-based implementation

Who

Implementation: @thibaudh @lmeyerov
Design feedback: @padentomasello
Implementation feedback: @briantrice @quinnhj @trxcllnt

Questions

  • If the user really does want a new tab, how should they now achieve that? Keep the pop-out button, or some other way? One thought already: may be nice to expand the control options on the static loader screen.
  • Should the resizing be an animated tween or immediate?

Warning when uploading large graphs

We have two interesting inflection points for graph size:

~1MM nodes + edges: client quality degrades
~8MM nodes or ~8MM edges: client hit testing becomes undefined

I'm wondering when IPython should warn vs. error . The second is definitely an error for now -- should the first also be?

SSL proxy for prod

( + @thibaudh )

Matt, we need to get the https proxy up as part of the API 1.0 release. This will, in turn, trigger so code changes. Can you let me know when that happens?

(HTTPS proxy should be up for both prod & staging. They currently use diff code paths, and I'd like to unify.)

unbound nodes?

Getting an error on the below:

    .edges(base).nodes(nodes).plot()

ERROR: Node identifier must be bound when using node dataframe.```

Way to pass in filters

( + @thibaudh + @briantrice )

This will have to wait for the API to support it first (...), but I'd like to write something like

g
   .data(nodes=..., edges=...)
   .bind(...)
   .filter('correlation > 10 & degree > 7')
   .plot()

The key is that all data should be passed into the viz so I can interactively change the filters later. (If I didn't want that behavior, I could filter on the data argument.)

Networkx / Python3 encoding issue

From https://gist.github.com/ResidentMario/87c282ea4ebded91ee31 :

import networkx as nx
import graphistry
graphistry.register(key='...[key obfuscated]...')
graph = nx.path_graph(4)
graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)```

Causes

Traceback (most recent call last):
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\IPython\core\interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-5eadd70251a7>", line 1, in <module>
    graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\graphistry\plotter.py", line 311, in plot
    info = PyG._etl1(dataset)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\graphistry\pygraphistry.py", line 185, in _etl1
    headers=headers, params=params)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 296, in prepare
    self.prepare_body(data, files, json)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 447, in prepare_body
    body = self._encode_params(data)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\models.py", line 84, in _encode_params
    return to_native_string(data)
  File "C:\Users\Alex\Anaconda3\envs\watson-graph\lib\site-packages\requests\utils.py", line 700, in to_native_string
    out = string.decode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
Raw

When run in Python2, a graph successfully generates. I do not have a working Python3 build right now, so could not duplicate there. But, assuming the error duplicates, narrows down the scope.

Keep column names instead of renaming to Source and Destination.

While doing bipartite graphs of IP address -> Alert name for example, it'd be nice to keep the column names so you can refer to them in histograms and filters. (I found myself confused for a few seconds when I didn't see alert names in the histograms).

Food for thought: this will get more complicated as we move past bipartite. I imagine wanting to have a histogram for each type of node. For bipartite graphs, we have this functionality because we have the concept of source and destination.

@thibaudh: I'm assigning myself, since I thought this might be a nice project to start with this code base. Maybe we can chat to see if this is appropriate (and something we even want to do) and what the right way to do this is.

Expose external hook for embedders

I'm trying to externally control a generated graphistry iframe, and so it'd be really dandy to work with it. A few options:

-- allow specifying HTML attributes (id, name, class); I think we only expose style and url_params
-- provide default class and gensym id
-- wrap into whatever widget system IPython and others are using and match that interface

Bind call required before most others

I expected pygraphistry.settings(...) and pygraphistry.edges(..) to do something, but it looks like they are only callable as pygraphistry.bind(..).settings(..) and pygraphistry.bind(..).edges(..).

Cannot bind nodes/edges in Plotter

pygraphistry.bind(...).edges(..) fails because there's both a field edges and method edges.

  • Suggestion 1: make the fields pygraphistry.bindings.edges.
  • Suggestion 2: make the methods return self on non-undefined set, and and return the binding when no value is passed in.

Key check

Whenever setting the key, check against /api/ for validity.

Hint to set notebook to Trusted

( @thibaudh : can you take, or should I?)

When opening a third-party notebook, our viz won't be shown because our JS won't run by default.

I propose either:

A) Print out warning/hint HTML to do File -> Trusted Notebook and then have JS delete that warning

B) Load an iframe URL and then have our existing iframe js logic overwrite it.

Leaning towards A due to embedding issues motivating the JS logic.

Python keywords cannot be used as fields of edges.

When I try to create a graph of the following single row dataframe (converted to dictionary):

{'count(1)': {19834: 2}, 'destinationAddress': {19834: u'63.240.185.216'}, 'destinationUserName': {19834: u'constructor'}, 'min(startTime)': {19834: u'19 Oct 2015 23:37:32 EDT'}, 'name': {19834: u'An account failed to log on.'}, 'sourceAddress': {19834: None}, 'timeUniform': {19834: 19686.0}}

I receive the following error:

ValueError Traceback (most recent call last)
in ()
1 plotter = graphistry.bind(source='destinationUserName', destination='destinationAddress')
----> 2 plotter.plot(hsldf[19834:19835])

/home/DCSAC/g.paden/.local/lib/python2.7/site-packages/graphistry/plotter.pyc in plot(self, graph, nodes)
303
304 PyG = pygraphistry.PyGraphistry
--> 305 dataset = PyG._etl(dataset)
306 viz_url = PyG._viz_url(dataset['name'], dataset['viztoken'], self._url_params)
307

/home/DCSAC/g.paden/.local/lib/python2.7/site-packages/graphistry/pygraphistry.pyc in _etl(dataset)
160 jres = response.json()
161 if jres['success'] is not True:
--> 162 raise ValueError('Server reported error:', jres['msg'])
163 else:
164 return {'name': jres['dataset'], 'viztoken': jres['viztoken']}

ValueError: ('Server reported error:', u'Illegal value for Message.Field .VectorGraph.edges: .VectorGraph.Edge (Error: Illegal value for Message.Field .VectorGraph.Edge.src of type uint32: function (not an integer))')

I believe it's because the destinationUserName is 'constructor' which I think is interpreted as a function and not a string. The binding are created using

'plotter = graphistry.bind(source='destinationUserName', destination='destinationAddress')'

igraph conversion misses binding

In cell 15 of rawdata/forwardjs/twitter/ForwardJS%20Community%20Analysis.ipynb#

ig = g.bind(source='friend', destination='speaker').pandas2igraph(edges)
clusters = ig.community_infomap()
num_clusters = len(list(set(clusters.membership)))
print('#clusters', str(len(list(set(clusters.membership)))))
clusters.membership[:10]

As is, it works, returning many clusters.

However, if I remove .bind(...), everything gets put into one cluster, suggesting it didn't remember the bindings from earlier calls.

Report server-side ETL errors to client

Generally, add meaningful error messages when API is misused (missing edges, nodes, source, etc)

Also ask user to upgrade pygraphistry when API version has changed

Graphs with spaces in names won't render

Given a call to plot() like the following:

graph.plot(edges, nodes, name="somename")

The above code works and renders a graph in Jupyter Notebook. However, the following does not:

graph.plot(edges, nodes, name="some name")

It just hangs at "Herding stray GPUs" forever. The only difference is the space in the graph name.

encoding issues

See notebook etl/BestbuySignals.ipynb at end (trying to merge in community detection):

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-33-2dbdcded78d1> in <module>()
----> 1 g.plot(graph=subset, nodes=nodes.merge(nodeid, on='id').drop_duplicates('id'))

/usr/local/lib/python2.7/site-packages/graphistry/plotter.pyc in plot(self, graph, nodes)
     93             util.error('Expected Pandas dataframe or Igraph graph.')
     94 
---> 95         json_dataset = json.dumps(dataset, ensure_ascii=False).encode('utf8')
     96         dataset_name = pygraphistry.PyGraphistry._etl(json_dataset)
     97         viz_url = pygraphistry.PyGraphistry._viz_url(dataset_name, self.url_params)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 10132878: ordinal not in range(128)

Also, unclear why I had to add the drop_duplicates() when going pandas->igraph->pandas.

parameter graph is confusing

graphistry.plot(graph=..., nodes=...) confuses me.

A bit of bike shedding, but, to me,

G = (V, E)

So, I expect parameters nodes/vertices to be V, edges to be E, and if there's a G, to be combined (V, E). For example, maybe igraph/networkx objects are a G.

register() errors are silent

When register(server='non-existent-ip'),failure is silent. Expected either a printed warning or an exception thrown.

ERROR. standard_library.install_aliases()

Hi, I got an error when import graphistry

import graphistry
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-ace3a50e6850> in <module>()
----> 1 import graphistry

build/bdist.linux-x86_64/egg/graphistry/__init__.py in <module>()

build/bdist.linux-x86_64/egg/graphistry/pygraphistry.py in <module>()

AttributeError: 'module' object has no attribute 'install_aliases'

How can I fix this?

Make the default plotted graph exportable by default

When plotting, we should make the default 'more friendly' for exporting as HTML. For example, imagine laying out for a few seconds & auto-saving, and the next time the same URL is loaded, start "offline" from that point. Likewise, an in-visualization save (new camera angle, added filters, ...) would update that visualization. The result is that, when I'm "done" with a notebook, I can just hit "export as html" without having to do anything special and it'll "just work".

Does this seem feasible, ideas on ~workarounds, ...? We may need StreamGL to cooperate a bit and be a bit smarter, but similar to the auto-splashscreen & play, seems doable.

print out current set of bindings

Now that the bindings are not explictly set at each call, it'd help to be able to printf the set of bindings active at a particular point.

Proposal 1:

graphistry.bind(...) : setter that returns graphistry with updated bindings
graphistry.bind() : getter that returns {<k> : <v>}

Proposal 2:
A top-level field bindings that is the dictionary of current bindings.

graphistry.bindings : returns {<k>: <v>}

I'm leaning towards Proposal 2.

Visualizing DNA variation graphs

@lmeyerov

vg is a system for working with sequence graphs that represent populations of genomes. There is wide support for this idea in genomics, and it is on the cusp of use in production contexts that are not well served by exisiting approaches based around a single reference genome sequence (such as the human MHC or in species with high diversity, like mosquitoes).

I have developed techniques to visualize variation graphs but these rely on graph sunsetting operations to visualize larger graphs and eventually meaningful examination if an entire graph breaks down. I'd be interested in seeing what graphistry is doing to handle this kind of use!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.