Coder Social home page Coder Social logo

thomasp85 / tidygraph Goto Github PK

View Code? Open in Web Editor NEW
537.0 23.0 61.0 29.35 MB

A tidy API for graph manipulation

Home Page: https://tidygraph.data-imaginist.com

License: Other

R 98.99% C++ 0.97% Rez 0.04%
r network-analysis igraph tidyverse graph-algorithms graph-manipulation

tidygraph's Introduction

tidygraph

R-CMD-check CRAN_Release_Badge CRAN_Download_Badge Coverage Status

This package provides a tidy API for graph/network manipulation. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data. tidygraph provides a way to switch between the two tables and provides dplyr verbs for manipulating them. Furthermore it provides access to a lot of graph algorithms with return values that facilitate their use in a tidy workflow.

An example

library(tidygraph)

play_gnp(10, 0.5) %>% 
  activate(nodes) %>% 
  mutate(degree = centrality_degree()) %>% 
  activate(edges) %>% 
  mutate(centrality = centrality_edge_betweenness()) %>% 
  arrange(centrality)
#> # A tbl_graph: 10 nodes and 51 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Edge Data: 51 × 3 (active)
#>     from    to centrality
#>    <int> <int>      <dbl>
#>  1     2     7       1.25
#>  2     6     5       1.33
#>  3     1     3       1.4 
#>  4     2    10       1.53
#>  5     2     8       1.58
#>  6     8     9       1.65
#>  7     2     3       1.67
#>  8     2     5       1.73
#>  9     3     5       1.73
#> 10     8     5       1.73
#> # ℹ 41 more rows
#> #
#> # Node Data: 10 × 1
#>   degree
#>    <dbl>
#> 1      6
#> 2      7
#> 3      6
#> # ℹ 7 more rows

Overview

tidygraph is a huge package that exports 280 different functions and methods. It more or less wraps the full functionality of igraph in a tidy API giving you access to almost all of the dplyr verbs plus a few more, developed for use with relational data.

More verbs

tidygraph adds some extra verbs for specific use in network analysis and manipulation. The activate() function defines whether one is manipulating node or edge data at the moment as shown in the example above. bind_edges(), bind_nodes(), and bind_graphs() let you expand the graph structure you’re working with, while graph_join() lets you merge two graphs on some node identifier. reroute(), on the other hand, lets you change the terminal nodes of the edges in the graph.

More algorithms

tidygraph wraps almost all of the graph algorithms from igraph and provides a consistent interface and output that always matches the sequence of nodes and edges. All tidygraph algorithm wrappers are intended for use inside verbs where they know the context they are being called in. In the example above it is not necessary to supply the graph nor the node/edge IDs to centrality_degree() and centrality_edge_betweenness() as they are aware of them already. This leads to much clearer code and less typing.

More maps

tidygraph goes beyond dplyr and also implements graph centric version of the purrr map functions. You can now call a function on the nodes in the order of a breadth or depth first search while getting access to the result of the previous calls.

More morphs

tidygraph lets you temporarily change the representation of your graph, do some manipulation of the node and edge data, and then change back to the original graph with the changes being merged in automatically. This is powered by the new morph()/unmorph() verbs that let you e.g. contract nodes, work on the linegraph representation, split communities to separate graphs etc. If you wish to continue with the morphed version, the crystallise() verb lets you freeze the temporary representation into a proper tbl_graph.

More data structure support

While tidygraph is powered by igraph underneath it wants everyone to join the fun. The as_tbl_graph() function can easily convert relational data from all your favourite objects, such as network, phylo, dendrogram, data.tree, graph, etc. More conversion will be added in the order I become aware of them.

Visualisation

tidygraph itself does not provide any means of visualisation, but it works flawlessly with ggraph. This division makes it easy to develop the visualisation and manipulation code at different speeds depending on where the needs arise.

Installation

tidygraph is available on CRAN and can be installed simply, using install.packages('tidygraph'). For the development version available on GitHub, use the devtools package for installation:

# install.packages('pak')
pak::pak('thomasp85/tidygraph')

Thanks

tidygraph stands on the shoulders of particularly the igraph and dplyr/tidyverse teams. It would not have happened without them, so thanks so much to them.

Code of Conduct

Please note that the tidygraph project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

tidygraph's People

Contributors

agila5 avatar chrmongeau avatar eferos93 avatar flpezet avatar jamesm131 avatar jdfoote avatar jjchern avatar jonmcalder avatar krlmlr avatar lionel- avatar luisdza avatar maelle avatar malcolmbarrett avatar michaelchirico avatar oliverbeagley avatar ramorel avatar rmflight avatar romainfrancois avatar thomasp85 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidygraph's Issues

Cannot set sender and receiver in tbl_graph

Regardless of column names or order, I can't seem to set which is the "from" and "to" node in an edgelist.

suppressPackageStartupMessages({
  library(tidyverse)
  library(tidygraph)
})
el <- tibble(from = c("b", "c"), to = c("a", "b"))
nodes <- tibble(name = letters[1:3])
# Cannot reverse ordering. First edge is a -> b no matter what:
tbl_graph(nodes, el)
#> # A tbl_graph: 3 nodes and 2 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 3 x 1 (active)
#>    name
#>   <chr>
#> 1     a
#> 2     b
#> 3     c
#> #
#> # Edge Data: 2 x 2
#>    from    to
#>   <int> <int>
#> 1     1     2
#> 2     3     1
tbl_graph(nodes, el[, 2:1])
#> # A tbl_graph: 3 nodes and 2 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 3 x 1 (active)
#>    name
#>   <chr>
#> 1     a
#> 2     b
#> 3     c
#> #
#> # Edge Data: 2 x 2
#>    from    to
#>   <int> <int>
#> 1     1     2
#> 2     3     1
tbl_graph(nodes, rename(el, from = to, to = from))
#> # A tbl_graph: 3 nodes and 2 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 3 x 1 (active)
#>    name
#>   <chr>
#> 1     a
#> 2     b
#> 3     c
#> #
#> # Edge Data: 2 x 2
#>    from    to
#>   <int> <int>
#> 1     1     2
#> 2     2     3
# Works with igraph
igraph::graph_from_edgelist(as.matrix(el))
#> IGRAPH ed9b43f DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from ed9b43f (vertex names):
#> [1] b->a c->b
igraph::graph_from_edgelist(as.matrix(el[, 2:1]))
#> IGRAPH 1002839 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 1002839 (vertex names):
#> [1] a->b b->c

coercion of phylo objects drops branch length data?

Consider this example:

In ape, an example phylogeny has branch length data, which is reflected in the plot:

library(ape)
data("bird.orders")
plot(bird.orders)

Note edge lengths differ based on the evolutionary distance separating the orders.

library(tidygraph)
library(ggraph)
bird <- as_tbl_graph(bird.orders) 

it appears the bird edge table has no length attributes.

intergraph package can handle igraph <-> network conversions

It's all in the title: if you need a quick way to support network::network, the intergraph can handle that. Supports both uni- and bipartite graphs if memory serves.

P.S. Your work on ggraph and tidygraph is awesome and inspiring. Thumbs up!

distance from leaf is not working

I think there's an error in node_distance_from(), as it returns wrong results when applied to node_is_leaf(). Here's an example:

library(tidygraph)

graph <- create_tree(20,2, directed = TRUE, mode="in") %>% 
  activate(nodes) %>% mutate(name=1:n())

The following code returns:

graph %>% 
  mutate(dist = node_distance_from(node_is_leaf())) %>% 
  as_tibble() %>% head

# A tibble: 20 x 2
    name  dist
   <int> <dbl>
 1     1     3
 2     2   Inf
 3     3     2
 4     4   Inf
 5     5   Inf
 6     6   Inf

While it should return:

# A tibble: 20 x 2
    name  dist
   <int> <dbl>
 1     1     3
 2     2     2
 3     3     2
 4     4     2
 5     5     1
 6     6     1

The way to fix it is in the file pair_measures.R (line 132) replace:
unlist(Map(function(s, t) {dist[s, t]}, s = match(source, source_unique), t = target))
with:
unlist(Map(function(s, t) {min(dist[ , t])}, s = match(source, source_unique), t = target))

Not sure what was the intention of "diagonal" picking from matrix, but here we are actually interested in minimum distance from all sources to each individual target.

root and leaf in 2-1 tree

When tree consists of one root and one leaf, node_is_root() and node_is_leaf() get lost:

library(tidygraph)
create_tree(2, 1) %>% mutate(node_is_root(), node_is_leaf())
#> # A tbl_graph: 2 nodes and 1 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 2 x 2 (active)
#>   `node_is_root()` `node_is_leaf()`
#>   <lgl>            <lgl>           
#> 1 T                T               
#> 2 F                F               
#> #
#> # Edge Data: 1 x 2
#>    from    to
#>   <int> <int>
#> 1     1     2

Results are opposite(but also wrong) if the tree grows inward:

library(tidygraph)

create_tree(2, 1, mode = "in") %>% mutate(node_is_root(), node_is_leaf())
#> # A tbl_graph: 2 nodes and 1 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 2 x 2 (active)
#>   `node_is_root()` `node_is_leaf()`
#>   <lgl>            <lgl>           
#> 1 F                F               
#> 2 T                T               
#> #
#> # Edge Data: 1 x 2
#>    from    to
#>   <int> <int>
#> 1     2     1

These simple trees may be part of the forest or may end up as such after pruning, so it is a realistic example.

Issue is in how node_is_root() and node_is_leaf() decide the "mode" of the graph - by simply comparing degree() calculated inwards and outwards.

Filter Components

Apologies in advance but I cannot figure out how to filter, say, the largest component of a graph.

'Layout_igraph_matrix' function not found

First off, thank you for making this package available, looking forward to using it.

Running through the example on your blog, I run into the following issue after calling:

> graph <- create_notable('zachary') %>% 
+  mutate(ranking = node_rank_leafsort())
> ggraph(graph, 'matrix') + 
+  geom_edge_point(mirror = TRUE) + 
+  theme_graph() + 
+  coord_fixed()
Error in layout_igraph_matrix(list(34, FALSE, c(1, 2, 3, 4, 5, 6, 7, 8,  : 
  could not find function "layout_igraph_matrix"

Did I miss something?

Session Info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2    ggraph_1.0.1    ggplot2_2.2.1   tidygraph_1.1.0

loaded via a namespace (and not attached):
 [1] gtools_3.5.0       modeltools_0.2-21  kernlab_0.9-25     purrr_0.2.4        lattice_0.20-35   
 [6] colorspace_1.3-2   stats4_3.4.3       viridisLite_0.3.0  yaml_2.1.16        rlang_0.1.6       
[11] pillar_1.1.0       prabclus_2.2-6     glue_1.2.0         tweenr_0.1.5       registry_0.5      
[16] fpc_2.1-11         foreach_1.4.4      bindr_0.1          plyr_1.8.4         robustbase_0.92-8 
[21] munsell_0.4.3      gtable_0.2.0       mvtnorm_1.0-7      caTools_1.17.1     codetools_0.2-15  
[26] seriation_1.2-3    class_7.3-14       flexmix_2.3-14     DEoptimR_1.0-8     trimcluster_0.1-2 
[31] Rcpp_0.12.15       KernSmooth_2.23-15 udunits2_0.13      diptest_0.75-7     scales_0.5.0      
[36] gdata_2.18.0       gplots_3.0.1       gridExtra_2.3      ggforce_0.1.1      digest_0.6.15     
[41] gclus_1.3.1        dplyr_0.7.4        ggrepel_0.7.0      grid_3.4.3         tools_3.4.3       
[46] bitops_1.0-6       magrittr_1.5       lazyeval_0.2.1     tibble_1.4.2       cluster_2.0.6     
[51] tidyr_0.8.0        whisker_0.3-2      pkgconfig_2.0.1    dendextend_1.7.0   MASS_7.3-47       
[56] assertthat_0.2.0   viridis_0.5.0      iterators_1.0.9    R6_2.2.2           TSP_1.1-5         
[61] mclust_5.4         nnet_7.3-12        units_0.5-1        igraph_1.1.2       compiler_3.4.3

graph alternative to .

The . shortcut to access the current context will behave as expected in tidygraph, that is it will reference the current active context. It might make sense to also provide a reference to the complete graph as well, such as .gr

Suggestion for tidygraph about db and viz.

previously, there are lots of packages about the graph in R, however, convert each other is so redundant.

The graph is very useful in industrial anti-fraud and recommendation system, I am seeking some tools to do better graph analysis, but most of the solution is not elegant and unified.

  • DB: rOpensci have developed nodbi, a nosql interface for DB connection. Maybe unified elasticsearch, HBase, Mongo, Redis, Druid in the future, so seeking tidygraph can integrate nodbi.

  • Viz: DiagrammeR based on the browser which integrated with Shiny very well. in that, looking forward to add DiagrammeR into tidygraph.

Thanks for your job to unify the separate graph system.

references

https://cran.r-project.org/web/packages/ggCompNet/vignettes/comparing-graph-drawing-speed.html

Should activate use NSE

There are really no prior art in this. On an abstract level activate references symbols in the data if tbl_graph is thought of as a named list with a nodes element containing the node tibble and a edges element containing the edge tibble. The normal verbs references the columns in the nibbles depending on which part is active, while the activate verb references the list of tibbles...

All in all, should the expected format be:

gr %>% activate(nodes)

or

gr %>% activate("nodes")

As this could affect future multi table API's I hope @hadley will chip in...

`filter` broken when result should be empty

Given some graph like this

library(tidyverse)
library(tidygraph)

g <- tbl_graph(
    nodes = data_frame(id = letters[1:5]),
    edges = data_frame(from = c("a", "a", "b", "b", "c", "c"), 
                       to = c("b", "d", "c", "e", "d", "e"))
)

I expect the following three statements to yield identical graphs (namely, empty ones).
Currently, they don't. g_false and g_missing_id both instead contain g unmodified.

g_false <- g %>% filter(FALSE)
g_missing_id <- g %>% filter(id == "f")
g_empty <- tbl_graph(
    nodes = data_frame(id = character()),
    edges = data_frame(from = character(), to = character())
)

If you provide an existing id (e.g. g %>% filter(id == "c")) filter works correctly.
Even g %>% filter(sample(c(T,F), size = 5, replace = T)) works (with probability 2e-5).

I didn't find the root cause yet so no PR atm :(

focus/unfocus verbs

The purpose of these would be to let one only calculate on a subset of nodes/edges without actually removing the underlying graph structure as would be done with morph(to_subgraph).

E.g. we might have a long running operation we would like to perform only for a single node which is dependent on the full graph. The current setup will require us to calculate it for all nodes as everything is vectorised. focus would limit the nodes/edges that gets referenced during vectorised computations.

iterate verb

something like

gr %>%
  iterate(5, function() {
    # modify the graph
  })

and/or

gr %>%
  iterate(graph_order() < 100, function() {
    # modify the graph
  })

Version mismatch

  • CRAN is version 1.1 (released 10th Feb)
  • Github is version 1.0.0.9999 (dated 12th Dec but most recent commit 22nd Mar)

Might be worthwhile tagging the Github release and bumping the version to avoid misleading package installations?

activate/active in a separate package?

I'd like to have activate and active in a separate unrelated package that tidygraph and others can import. Do you see any problems with that? I'm happy to do it if you don't see any obstacles. I'm using this for activating variables within NetCDF files, though I toyed with using a different name for the active part.

It's such a simple concept and the perfect name, so an activate package with two functions, using NSE would be of wider use I think. It seems this would be used for database connections, and it's the perfect match for GDAL data sources with multiple layers, i.e. we could deploy it with sf::read_sf to pick layers in vector GIS data sources, or with raster in place of its varname = argument.

not all dplyr actions are imported

First of all, I love this package! it makes a lot of sense and it's very tidy!

However:
I couldn't use the row_number() function from dplyr without explicitly loading dplyr.

This works:

library(tidygraph)
library(ggraph)
library(dplyr)
play_erdos_renyi(10, 0.5, directed = FALSE) %>% 
  activate(nodes) %>% 
  mutate( rownbr = row_number())

This does not

library(tidygraph)
library(ggraph)
play_erdos_renyi(10, 0.5, directed = FALSE) %>% 
  activate(nodes) %>% 
  mutate( rownbr = row_number())

And this also doesn't seem to work

library(tidygraph)
library(ggraph)
play_erdos_renyi(10, 0.5, directed = FALSE) %>% 
  activate(nodes) %>% 
  mutate( rownbr = dplyr::row_number())

Is this on purpose?

Add split_by method

Like group_by except temporary subgraphs are created thus affecting all graph based calculations

Temporary graph representations

Having thought about my ideas for split_by a bit I think it might be worthwhile to consider extensions to the idea. In general a temporary graph is a change in the graph topology for the sake of calculating different node and edge properties, which will then get propagated back to the main graph.

An example: Calculate community membership, split each community into induced subgraphs and calculate a centrality score for the nodes based on the new topology should be something along the lines of:

gr %>%
  mutate(cluster = group_infomap()) %>%
  split_by(cluster) %>%
  mutate(cluster_cent = centrality_closeness() %>%
  unsplit()

I think this is quite a powerful abstraction of a complicated procedure. Question is if there are other interesting temporary representations. The main constraint is that the representation should not alter the nodes or edges depending on which is active. This means that if nodes are active it is ok to remove edges during split_by as the graph is unsplit and ungrouped prior to changing activation.

Other temporary representation:

  • edge subset
  • graph complement
  • ... (please add if something pops up)

Error in add_vertices(gr, nrow(nodes) - gorder(gr)) with high numbers

Hi,
I encountered an error trying to create a tbl graph. The function tbl_graph fail when large numbers are passed as ids.

library(tidygraph)
library(dplyr)

nodes <- structure(list(user_id = c(16222469L, 18856867L, 289485255L, 
                                    381289719L, 5988062L, 25073877L, 37677496L, 44196397L, 276934698L), 
                        name = c("Christophe Barbier", "zerohedge", "Emanuel Derman", 
                                 "Nassim Nicholas Taleb", "The Economist", "Donald J. Trump", 
                                 "Russell Roberts", "Elon Musk", "Valérie Boyer")), 
                   .Names = c("user_id", "name"), row.names = c(NA, -9L), class = c("data.frame"))

edges <- structure(list(from = c(381289719L, 381289719L, 381289719L, 381289719L, 
                                 381289719L), to = c(44196397L, 25073877L, 18856867L, 37677496L, 289485255L
                                 )), class = "data.frame", row.names = c(NA, -5L), .Names = c("from", 
                                                                                              "to"))

graph_reseau <- tbl_graph(nodes = nodes, 
                          edges = edges)
#> Error in add_vertices(gr, nrow(nodes) - gorder(gr)): At type_indexededgelist.c:369 : cannot add negative number of vertices, Invalid value

After recoding ids, the function works.

key_value <- nodes %>% 
  select(user_id) %>% 
  mutate(new_id = rank(user_id))

new_nodes <- nodes %>% 
  left_join(key_value, by = "user_id") %>% 
  select(user_id = new_id, name)

new_edges <- edges %>% 
  left_join(key_value, by = c("from" = "user_id")) %>% 
  left_join(key_value, by = c("to" = "user_id"))  %>% 
  select(from = new_id.x, to = new_id.y)
  

tbl_graph(nodes = new_nodes, 
                          edges = new_edges)
#> # A tbl_graph: 9 nodes and 5 edges
#> #
#> # A rooted forest with 4 trees
#> #
#> # Node Data: 9 x 2 (active)
#>   user_id                  name
#>     <dbl>                 <chr>
#> 1       2    Christophe Barbier
#> 2       3             zerohedge
#> 3       8        Emanuel Derman
#> 4       9 Nassim Nicholas Taleb
#> 5       1         The Economist
#> 6       4       Donald J. Trump
#> # ... with 3 more rows
#> #
#> # Edge Data: 5 x 2
#>    from    to
#>   <int> <int>
#> 1     9     6
#> 2     9     4
#> 3     9     3
#> # ... with 2 more rows

Since I am new to tidygraph. I don't really know whether this is a bug or if I misused the function.
But I didn't find anything in the tbl_graph documentation advising to use low integers as ids.

igraph's min_separators() and is.separator() behave incorrectly

min_separators() and is.separator() behave incorrectly on some tidygraph objects.
example:

#### Reproduce separator error
library(tidygraph)
library(igraph)

edges <- data.frame(list('from' = c("A", "B"),
                         'to' = c("B","C")))
g<-as_tbl_graph(edges)
plot(g)

min_separators(g) #should return B, empty list instead

is.separator(g, "B") # correct
is.separator(g,'A') # correct
is.separator(g,c("A", "C")) # incorrectly returns true.

mutate semantics

Great package! I've noticed a problem with the way mutate is implemented, which makes it break with the usual convention that in a sequence like
mutate(data,a=...,b=a+...)
variables are defined sequentially and "a" is therefore available when defining "b". The problem occurs when using graph traversal functions like map_bfs_dbl:

#Works: variable r is created in its own call to mutate 
create_lattice(5) %>% activate(nodes) %>% mutate(r=1:5) %>% mutate(val=map_bfs_dbl(1,.f=function(node,path,...) .N()$r[node])) 

#Doesn't work: r is not yet available in .N()
create_lattice(5) %>% activate(nodes) %>% mutate(r=1:5,val=map_bfs_dbl(1,.f=function(node,path,...) .N()$r[node])) 

Perhaps set_graph_data should be called after each new variable is defined?

as_symbol from rlang

Sorry for the terse message, I had previously worked around this by changing the imports for tidygraph, then put it aside and forgot.

I hope to get back to this and report better, but in case it's an obvious fix here's what I see:

Error : object 'as_symbol' is not exported by 'namespace:rlang'

Compute rowmeans with dplyr does not work

It seems using df %>% dplyr::mutate(mean=rowMeans(.)) work for tibble but same command does not work with tidygraph object (with node data frame activated). It reports "Error in mutate_impl(.data, dots) :
Evaluation error: 'x' must be an array of at least two dimensions."

It would be great if same thing works directly on the graph object.

Create a tidygraph from a data.frame with edge and node data

How can I add node data when creating a tidygraph from a data.frame?

The current implementation is quite obscure to me.

It ignores everything passed in ..., and I don't understand which columns go to nodes and which to edges.

edge_data <- structure(list(from = c("Shewanella", "Shewanella", "Shewanella", "Shewanella", "Shewanella", "Shewanella"), 
to = c("OST4", "FBXL3", "SSBP1", "PSMA5", "KBTBD6", "SMIM26"), 
Correlation = c(-0.297932186956912, 0.298093379023602, -0.298112615989259, -0.298133596457624, 0.298125313026478, -0.298239510052464), 
pvalue = c(0.0498882907742567, 0.0496801649043651, 0.0496583960728696, 0.049652029635132, 0.049652029635132, 0.0494966797853996
)), .Names = c("from", "to", "Correlation", "pvalue"), row.names = 42:47, 
class = "data.frame")

nodes_data <- structure(list(name = structure(c(3L, 1L, 6L, 4L, 2L, 5L), 
.Label = c("FBXL3", "KBTBD6", "OST4", "PSMA5", "SMIM26", "SSBP1"), class = "factor"), 
    pathway = structure(c(1L, 1L, 2L, NA, NA, NA), .Label = c("path1", 
    "path2"), class = "factor")), .Names = c("name", "pathway"
), row.names = c("OST4", "FBXL3", "SSBP1", "PSMA5", "KBTBD6", 
"SMIM26"), class = "data.frame")

tbg <- as_tbl_graph(edges = edge_data, nodes = nodes_data)
## Error in c("to", "from") %in% names(x) : 
##   argument "x" is missing, with no default
(tbg <- as_tbl_graph(edge_data, nodes = nodes_data))
## # A tbl_graph: 7 nodes and 6 edges
## #
## # A rooted tree
## #
## # Node Data: 7 x 1 (active)
##   name      
##   <chr>     
## 1 Shewanella
## 2 OST4      
## 3 FBXL3     
## 4 SSBP1     
## 5 PSMA5     
## 6 KBTBD6    
## # ... with 1 more row
## #
## # Edge Data: 6 x 4
##    from    to Correlation pvalue
##   <int> <int>       <dbl>  <dbl>
## 1     1     2      -0.298 0.0499
## 2     1     3       0.298 0.0497
## 3     1     4      -0.298 0.0497
## # ... with 3 more rows

Are there some accessors to the node tibble to modify it afterwards?

add wrappers for graph constructers

A lot of different graph constructers are available through a lot of different unrelated names. Could be a case for simplifying the API a bit...

E.g. using the create_ pronoun, so we'll have create_ring, create_erdos, create_smallworld etc

Naming scheme for algorithm functions

The masterplan is to have tidy helpers for all algorithms that calculate properties of vertices and edges. These should behave kind of like n() in dplyr, that is, now the context in which they are called so it is not necessary to specify the graph object nor the nodes/edges to compute on. They should all return vectors in the correct order and length so it fits naturally in a mutate call

The above is set in stone. However, the naming of the functions is not. One big problem is that igraph has taken all the good/obvious names (who can blame them) and I want to retain full cross-compatibility with igraph.

Current idea: Create a small two-level ontology of properties for nodes and edges and let that guide the naming. E.g. a nodes can have degree, centrality etc so names could be degree_in() and centrality_alpha(). This does not read out nice as words are often flipped around, but it aids in searching for related algorithms with autocomplete which is a huge boon.

Input from interested users are very welcome...

head/tail

Not a major problem, but head and tail behave in a counter-intuitive manner, I think. I'd expect:

create_ring(10) %>% activate(nodes) %>% head

to return information on the first 10 nodes

create_ring(10) %>% activate(edges) %>% head

to return information on the first 10 edges, and finally

create_ring(10) %>% head

to maybe say that a graph doesn't have a head. tidygraph reverts to igraph's default in all cases. I thought I'd add a head.tbl_graph function myself, but noticed that a tbl_graph doesn't seem to have an unactivated state - it's nodes by default. Wouldn't it make sense to have such a state?

On an unrelated note (sorry), the initial table for nodes created by create_ * is empty - perhaps it'd make sense to initialise it with a single index column?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.