fslaborg / graphoscope Goto Github PK
View Code? Open in Web Editor NEWA pragmatic approach to network science.
Home Page: http://fslab.org/Graphoscope/
License: MIT License
A pragmatic approach to network science.
Home Page: http://fslab.org/Graphoscope/
License: MIT License
As mentioned in #33, the network density on FGraph works with the calculations for maximum edge count in reference to undirected graphs. Update this to match the directed character of the FGraph.
Tooling stack:
Description
The Triangle Count algorithm counts the number of triangles for each node in the graph. A triangle is a set of three nodes where each node has a relationship to the other two. In graph theory terminology, this is sometimes referred to as a 3-clique. The Triangle Count algorithm in the GDS library only finds triangles in undirected graphs.
Pointers
The clustering coefficient makes use of triangles, which might give some ideas about the implementation.
References
https://neo4j.com/docs/graph-data-science/current/algorithms/triangle-count/
This should involve comparing different options in terms of performance and usability.
Further info is here
Atm., it is tedious to simply retrieve all nodes from an FGraph. It'd be nice to have a built-in function for this.
Current workaround (provided by @LibraChris):
let getNodes (graph : FGraph<'Nk,'Nd,'Ed) =
graph
|> Seq.map (
fun kvp ->
let nodeKey = kvp.Key
let p,nd,s = kvp.Value
nodeKey, nd
)
.Using the “Hierarchy of truth” -> Atlas, Networkx, Konnect, Wikipedia
Dijkstra for FGraph assumes that 'EdgeData = float.
Rewrite the function and add parameter function that converts 'EdgeData to float.
Ideally Erdos-Renyi and Watts–Strogatz on the core structure.
Following the sprint 2 discussion and any further input from Timo please set up the project with the following principles:
Overloaded methods for basic operations so users don't need to open different modules for different data structures
Different namespaces and/or modules for directed and undirected graphs
For now we can treat unweighted graphs as having a weight of 1 in every edge.
Originally posted by general-rishkin August 29, 2023
For a DiGraph
G, to retrieve the incoming neighbours of a node (say, node 1), you write:
1 |> DiGraph.getInEdges G
whereas to retrieve the outgoing neighbours, you write:
G |> DiGraph.getOutEdges 1
It would be more consistent to select either:
1 |> DiGraph.getInEdges G
1 |> DiGraph.getOutEdges G
OR
G |> DiGraph.getInEdges 1
G |> DiGraph.getOutEdges 1
```</div>
Might be a matter of taste but when using FSX files I find immutability very helpful.
Should we update operations like removeNode to create, update and return a copy of the passed in graph.
This keeps the usage closer to a functional F# style and removes cognitive overhead of mutating objects in scope of interactive sessions. Any thoughts?
discuss with Benedict to ensure alignment with fsharp.stats
Current options include using libtorch
The Watts-Strogatz model, introduced by Duncan J. Watts and Steven Strogatz in 1998, is a fundamental concept in network science and complex systems. It allows us to simulate and understand small-world networks, which possess unique characteristics including high local clustering and short average path lengths between nodes.
We need to implement the Watts-Strogatz model to create small-world networks within our project. This model will be invaluable for studying network structures in various applications such as social networks, transportation systems, and more.
Implement a random rewiring process with a parameter 'p' to introduce randomness.
Ensure that, as 'p' varies from 0 to 1, the network transitions from a regular lattice to a small-world network with shorter average path lengths while maintaining high local clustering.
Code implementation of the Watts-Strogatz model.
Ability to configure the 'p' parameter to control the level of randomness.
Verify that the model generates networks with appropriate small-world characteristics.
The Watts-Strogatz model has wide-ranging applications in fields such as sociology, epidemiology, and network analysis. Implementing this model will enhance our project's capability to simulate and analyze complex systems and networks.
Parallelise and produce benchmarks is possible.
I have tried it for small graphs but it never seems to finish.
#r "nuget: Graphoscope, 0.2.0"
open Graphoscope.RandomModels
open Graphoscope
open FGraph
let N = 10
let edgesPerIteration = 2
let myBarabasiAlbert = BarabasiAlbert.initFGraph N edgesPerIteration id id (fun x -> 1.0) FGraph.empty
Do you have sample which works?
thanks
Using type alias approach with functions and mutable objects. Key decision is whether to use dictionarys or resize arrays. The module is likely to be combination of FGraph and DiGraph. Need to determine if an IdMap object significantly improves performance and on which operations.
Aim to double test coverage to about 60% for both LGraph and FGraph
Do this for both for DiGraph and FGraph.
This is a preliminary step for the Floyd-Warshall algorithm.
Further to this discussion implement agreed syntax for
And produce benchmarks.
We need one or more graphs of sufficient size and complexity to be used for testing. Find such a graph and source of reliable metrics for test assertions.
Description
Our ultimate goal is to leave the decision to use mutation or not to the user. Mutation makes sense for heavy duty operations that run on the cloud. But while developing and working with smaller graphs mutation gets in the way and can be unwieldy to use. The initial step for this is to have a deep copy functionality.
Pointers
https://learn.microsoft.com/en-us/dotnet/api/system.object.memberwiseclone?view=net-7.0
Description
The Wedge Count algorithm, like the Triangle Count algorithm, is a graph analysis technique used to identify certain structural patterns within a graph. In this case, it focuses on detecting wedges. A wedge (sometimes called Triad) is a collection of three nodes in the graph where two of these nodes share a direct edge or relationship, and the third node serves as a common neighbour to the other two. This concept is often referred to as a "2-clique" in graph theory.
Pointers
The clustering coefficient makes use of triangles, which might give some ideas about the implementation.
References
Description
Star graph is a special type of graph in which n-1 vertices have degree 1 and a single vertex have degree n – 1. This looks like n – 1 vertex is connected to a single central vertex.
Pointers
Assume graph is undirected.
References
https://www.geeksforgeeks.org/check-star-graph/
https://networkx.org/documentation/stable/reference/generated/networkx.generators.classic.star_graph.html
We need to process for automating the publishing of nuget packages from sprint releases.
This will make Michele's testing feedback easier (he can reference the current version in fsx rather than having to build locally) and I think rapid iterative publication will improve quality faster.
I think we could publish v0.1 to the public nuget feed. Clearly we will need to caveat it with "alpha version - unfit for consumption" etc. Alternatively we could use a private package manager.
Happy to discuss.
The following code to obtain the loop count of a digraph fails. none of the suggestions from the error words. the FGraph version works fine.
let mutable lg: DiGraph<int, float> =
DiGraph.createFromEdges [| (0, 2, 1.0); (0, 1, 1.0)|]
|> DiGraph.addEdge (1, 2, 1.0)
|> DiGraph.addEdge (0, 1, 1.0)
Measures.Loop.loopCount lg
typecheck error Value restriction. The value 'it' has been inferred to have generic type
val it: '_a
Either define 'it' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
In the current state of the GraphoscopeRestructure branch, the compute functions for directed and undirected FContextMaps are facing a challenge where they cannot share the same name. Consequently, they have been temporarily separated into compute and computeUndirected. This distinction was introduced as a workaround until a more sustainable solution can be implemented.
Hello,
I picked up about the existence of this library during the Data Science in F# conference!
There are two ways to determine the shortest path in a graph and I'd like to know if it would be difficult to get a longest path algorithm.
My use case is the following: recently, a new F# compiler flag was added (--test: GraphBasedChecking
). This allows your files to be type-checked in parallel where possible. In short, a graph of a project is being constructed that lists all the dependencies (edges) between files.
In the theoretical scenario that you have infinite cores for maximum paralyzation, the times it will take to type-check a code base would be the longest chain in the graph.
As I'm not quite familiar (and clever) enough to figure this out, I'm wondering if this could be added to this library by someone. I naively hope this isn't much work since the shortest path is already available. In return, I'd be happy to help out with something else. A blog post about the compiler flag comes to mind and there I could showcase this package and expose it outside the pure data science scene. Or anything else really.
Happy to hear your thoughts!
Cheers,
Florian
Rename Degree measures of FGraph to match DiGraph implementation as seen in #33
Description
Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected communities. In the worst case, communities may even be disconnected, especially when running the algorithm iteratively. In our experimental analysis, we observe that up to 25% of the communities are badly connected and up to 16% are disconnected. Leiden algorithm overcomes these problems.
Pointers
Work your way up from Louvain as it is described in the Appendix of the paper to Leiden.
References
https://www.nature.com/articles/s41598-019-41695-z
https://static-content.springer.com/esm/art%3A10.1038%2Fs41598-019-41695-z/MediaObjects/41598_2019_41695_MOESM1_ESM.pdf
Tests should include
node/edge count
average degree
Graph partitioning refers to dividing a graph into smaller, non-overlapping subgraphs while optimizing some objective function or constraint.
It can be used to optimise graph reduction and fast component parting.
Since FGraph is based on a dictionary structure, graph partitioning can be based on dictionary partitioning.
At the moment the create functions work with simple collections of tupel or tripel.
Add a create functions that work of type lists:
NodSeq as seq seq and EdgeSeq as seq.
Chose a canonical example from the reference graphs for the following graph types:
unweighted undirected
unweighted directed
weighted directed
weight undirected
Add create graph helper methods to the test lib. Graphs should be built from edgelist txt files
This should include section on creating graphs from edge lists and running basic operations and algorithms in an fsx environment.
Description
Our ultimate goal is to leave the decision to use mutation or not to the user. Mutation makes sense for heavy duty operations that run on the cloud. But while developing and working with smaller graphs mutation gets in the way and can be unwieldy to use. The initial step for this is to have a deep copy functionality.
Pointers
https://learn.microsoft.com/en-us/dotnet/api/system.object.memberwiseclone?view=net-7.0
Expand existing tests to cover end to end use cases from creation to running algorithms.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.