Coder Social home page Coder Social logo

exascaleinfolab / lfr-benchmark_undirweightovp Goto Github PK

View Code? Open in Web Editor NEW
78.0 3.0 14.0 50 KB

Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms using generated ground-truth communities

License: GNU General Public License v2.0

C++ 99.80% Makefile 0.20%
clustering-benchmark synthetic-data network-generator

lfr-benchmark_undirweightovp's Introduction

LFR-Benchmark_UndirWeightOvp

Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms

Description

This program is an implementation of the algorithm described in the paper "Directed, weighted and overlapping benchmark graphs for community detection algorithms", written by Andrea Lancichinetti and Santo Fortunato. In particular, this program is to produce undirected weighted networks with overlapping nodes.
Each feedback is very welcome. If you have found a bug or have problems, or want to give advises, please contact us:

[email protected]
[email protected]

Turin, 29 October 2009

Original sources:

Reference: A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78.

This is an extended version of the original LFR framework (see the changelog) by Artem Lutov [email protected]

Content

Compilation

In order to compile, type: $ make.
This command builds the binary named TAG=lfrbench_udwo as specified in the ./makefile. In the following description, the application (lfrbench_udwo) is referred to as benchmark for convenience.

Usage

To run the program, type:

./benchmark [FLAG] [P]

To set the parameters, type:
-N		[number of nodes]
-k		[average degree]
-maxk		[maximum degree]
-mut		[mixing parameter for the topology]
-muw		[mixing parameter for the weights]
-beta		[exponent for the weight distribution]
-t1		[minus exponent for the degree sequence]
-t2		[minus exponent for the community size distribution]
-minc		[minimum for the community sizes]
-maxc		[maximum for the community sizes]
-on		[number of overlapping nodes]
-om		[number of memberships of the overlapping nodes]
-C		[Average clustering coefficient]
-cnl		[output communities as strings of nodes (input format for NMI evaluation)]
-name		[base name for the output files]. It is used for the network, communities and statistics; files extensions are added automatically:
	.nsa  - network, represented by space/tab separated arcs
	.nse  - network, represented by space/tab separated edges
	{.cnl, .nmc}  - communities, represented by nodes lists '.cnl' if '-cnl' is used, otherwise as a nodes membership in communities '.nmc')
	.nst  - network statistics
-seed		[file name of the random seed, default: seed.txt]
-a		[{0, 1} yield directed network (1 - arcs) rather than undirected (0 - edges), default: 0 - edges]

In this program you can assign the number of overlapping nodes (option -on) and assign the number of memberships for them (option -om). The other nodes will have only one membership. For instance, typing

./benchmark [flags...] -N 1000 -on 20 -om 2

will produce a network with 1000 nodes, 980 with only one membership and 20 nodes with two memberships.

It is also possible to set the parameters writing flags and relative numbers in a file (look at flags.dat to see an example). To specify the file, use the option: -f [filename]

You can set the parameters both writing some of them in the file, and using flags from the command line for others (if you set a parameter twice, the latter one will be taken).

-N, -k, -maxk, -muw have to be specified. For the others, the program can use default values: t1=2, t2=1, on=0, om=0, beta=1.5, mut=muw, minc and maxc will be chosen close to the degree sequence extremes.

The flag -C is not mandatory. If you use it, the program will perform a number of rewiring steps to increase the average cluster coefficient up to the wished value. Since other constraints must be fulfilled, if the wished value will not be reached after a certain time, the program will stop (displaying a warning).

Random Seed

In the specified seed file (default is seed.txt) you can edit the seed which generates the random numbers. After reading, the program will increase this number by 1 (this is done to generate different networks running the program again and again). If the file is erased, it will be produced by the program again.

Accessory Options

To have a random network use: -rand
Using this option will set mut=0, muw=0 and minc=maxc=N, i.e. there will be only one community.

Use option: -sup (-inf)

if you want to produce a benchmark whose distribution of the ratio of external degree/total degree is superiorly (inferiorly) bounded by the mixing parameter (only for the topology). In other words, if you use one of these options, the mixing parameter is not the average ratio of external degree/total degree (as it used to be) but the maximum (or the minimum) of that distribution. When using one of these options, what the program essentially does is to approximate the external degree always by excess (or by defect) and if necessary to modify the degree distribution. Nevertheless, this last possibility occurs for a few nodes and numerical simulations show that it does not affect the degree distribution appreciably.

Examples

Example1: ./benchmark -N 1000 -k 15 -maxk 50 -muw 0.1 -minc 20 -maxc 50
Example2: ./benchmark -f flags.dat -t1 3

If you want to produce a kind of Girvan-Newman benchmark, you can type:
./benchmark -N 128 -k 16 -maxk 16 -muw 0.1 -minc 32 -maxc 32 -beta 1

Output

Please note that the community size distribution can be modified by the program to satisfy several constraints (a warning will be displayed).

The program will produce three files:

  1. Network file Network.<nsl> contains the list of edges (nodes are labelled from 1 to the number of nodes; the edges are ordered and repeated twice for nsa format and unordered (not repeated) for the nse format, i.e. source-target and target-source), with the relative weight. The NSL format, the network specified by (arcs / edges), is a generalization of the .snap, .ncol and Edge/Arcs Graph formats.
  2. Community file Network.nmc contains a list of the nodes and their membership (memberships are labeled by integer numbers >=1).
  3. Statistics file Network.nst contains the degree distribution (in logarithmic bins), the community size distribution, the distribution of the mixing parameter for the topology and the weights, and the internal and external weight distribution.

Acknowledgement

Thanks to:

  • Peter Ronhovde and Conrad Lee, for many usuful advises
  • Rodrigo Rocha Gomes e Souza, for reporting bugs (and also fixing them)
  • Filippo Radicchi and Jos Ramasco for testing the program

Changelog

Additionally implemented features on top of the original LFR Benchmark are the following:

  • Parameter -a added to specify directed (arcs) / undirected (edges) output network
  • Parameter -seed added for the custom seed filename
  • Parameter -cnl (community nodes list) added to output communities (clusters) as lists of nodes to be compatible with NMI evaluation input format (.cnl)
  • Parameter -name added to give custom name for the output files
  • maxk is automatically decreased if required for the network generation with the specified k (instead of the generation process termination)

Related Projects

  • Clubmark - A parallel isolation framework for benchmarking and profiling clustering (community detection) algorithms considering overlaps (covers).
  • PyNetConvert - Network (graph, dataset) converter from Pajek, Metis and .nsl formats (including .ncol, Stanford SNAP and Edge/Arcs Graph) to .nsl (.nse/a that are more common than .snap and .ncol) and .rcg (Readable Compact Graph, former .hig; used by DAOC / HiReCS libs) formats.

Note: Please, star this project if you use it.

lfr-benchmark_undirweightovp's People

Contributors

benjamin-loison avatar luav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lfr-benchmark_undirweightovp's Issues

Weight Mixing Parameter Clarification

Hello,

Based on my comprehension, it appears that the weight distribution adheres to a power law distribution, and you can adjust it by manipulating the weight exponent and the weight mixing parameter. I'm curious to know which specific type of power law distribution you are employing. Are you using any of the following:

  1. Lognormal
  2. Exponential
  3. Truncated power law
  4. Stretched exponential
  5. Lognormal (positive)

Best,
Ali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.