Coder Social home page Coder Social logo

ewaldis's Introduction

EWALDIS

An algorithm for mining discriminative patterns in dynamic graphs. Accompanying paper was submitted to Discovery Science (DS) 2018 conference.

Building EWALDIS (in the project directory) in Linux

  • Requires cmake of version >= 3.9 and c++17 compiler
cmake .
make

Running the experiments for DS 2018 paper

  • Results will be written into results/* directories
./ewaldis -v --vertices data/dblp1/dblp_vertices.csv --edges data/dblp1/dblp_edges.csv --train_pos data/dblp1/dblp_train_positive_events.csv --train_neg data/dblp1/dblp_train_negative_events.csv -o results/dblp1 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp1/dblp_test_positive_events.csv --test_neg data/dblp1/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/dblp2/dblp_vertices.csv --edges data/dblp2/dblp_edges.csv --train_pos data/dblp2/dblp_train_positive_events.csv --train_neg data/dblp2/dblp_train_negative_events.csv -o results/dblp2 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp2/dblp_test_positive_events.csv --test_neg data/dblp2/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/dblp3/dblp_vertices.csv --edges data/dblp3/dblp_edges.csv --train_pos data/dblp3/dblp_train_positive_events.csv --train_neg data/dblp3/dblp_train_negative_events.csv -o results/dblp3 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp3/dblp_test_positive_events.csv --test_neg data/dblp3/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/enron/enron_vertices.csv --edges data/enron/enron_edges.csv --train_pos data/enron/enron_train_positive_events.csv --train_neg data/enron/enron_train_negative_events.csv -o results/enron -e edges -a nominal -p 10 -n 10 --test_pos data/enron/enron_test_positive_events.csv --test_neg data/enron/enron_test_negative_events.csv -m 5 -w 3000 -r 0.3 -pu 1728000 -su 345600 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

Description of EWALDIS output files

  • For each experiment, there is a directory with the following three files for each discriminative pattern:
  • results_K.txt with information about the pattern itself:
    • DATA - name of the experiment
    • TOTAL_TIME_PREPARATION - running time of data preparation in seconds (this phase is common for all patterns from one experiment and thus should be counted only once in total time)
    • TOTAL_TIME_MINING - running time of pattern mining in seconds
    • TOTAL_TIME_EVALUATION - running time of pattern evaluation in seconds
    • TIME_UNIT_PRIMARY - primary time unit parameter
    • TIME_UNIT_SECONDARY - secondary time unit parameter
    • RANDOM_WALKS - number of random walks
    • PROB_RESTART - probability of restart
    • MAX_PATTERN_EDGES - maximum number of pattern edges to be found
    • EVOLUATION_EPOCHS - number of epochs in the genetic algorith
    • EVOLUATION_SUBEPOCHS - number of subepochs in the genetic algorith
    • EVALUATION_RANDOM_WALKS - number of random walks used for evaluation
    • PATTERN_EDGES - list of pattern edges in the form of (source_vertex, destination_vertex)
    • PATTERN_SCORES - scores of pattern edges (their fitness from the genetic algorithm)
    • PATTERN_TIMESTAMPS - distribution of relative timestamps for each pattern edge
    • PATTERN_ATTRIBUTES - distribution of attributes for each pattern edge
    • PATTERN_DIRECTIONS - True = edge directed from source to destination, False = the opposite direction; makes sense only for directed graph
    • PATTERN_UNDIRECTED - whether the patten is an undirected graph
    • SAMPLE_EVALUATION_POSITIVE - matching scores on positive events from sample set
    • SAMPLE_EVALUATION_NEGATIVE - matching scores on negative events from sample set
    • TRAIN_EVALUATION_POSITIVE - matching scores on positive events from train set
    • TRAIN_EVALUATION_NEGATIVE - matching scores on negative events from train set
    • TEST_EVALUATION_POSITIVE - matching scores on positive events from test set
    • TEST_EVALUATION_NEGATIVE - matching scores on negative events from test set
  • results_K_positive_population.txt
    • For each pattern edge, there are summary statistics of fitness score for each epoch
    • For example, if there are 20 epochs, then first 20 rows belong to the first pattern edge, etc.
    • Each row contains 0%, 10%, 50%, 90%, and 100% percentile of the fitness
    • There are exactly MAX_PATTERN_EDGES * EVOLUATION_EPOCHS rows in the file; if the algorithm finished earlier for a given pattern, then there are missing value
  • results_K_negative_population.txt
    • Similar information as in the previous file, but now for each pattern edge and each epoch, there are summary statistics of negative population fitness for each subepoch
    • Thus, there are MAX_PATTERN_EDGES * EVOLUATION_EPOCHS * EVOLUATION_SUBEPOCHS rows
    • Again, if the computation finishes earlier, there may be missing values
  • Moreover, there are two csv files used for creating baseline dataset by R scripts: baseline_edges_data_train.csv and baseline_edges_data_test.csv

ewaldis's People

Contributors

karelvaculik avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.