Mutually exciting point process graphs for modelling dynamic networks

This repository contains a Python library supporting the paper Sanna Passino, F. and Heard, N. A. (2023) "Mutually exciting point process graphs for modelling dynamic networks", Journal of Computational and Graphical Statistics, 32:1, 116-130 (link, arXiv preprint).

The library meg can be installed in edit mode as follows:

pip install -e lib/

The library can then be imported in any Python session:

import meg

The repository contains multiple directories:

lib contains the Python library;
notebooks contains Jupyter notebooks with examples on how to use the library;
scripts contains Python scripts to reproduce the results in the paper;
results contains some of the results described in the paper;
plots contains Python scripts for reproducing the plots in the paper;
tikz_process contains .tex files for reproducing Figure 1;
fox contains additional scripts for implementing the methodology of Fox et al. (2016).

Update [December 2023] - If Python 3.12 is used, then the instructions at this link should be followed to install numba for Python 3.12, which would then avoid errors with the installation of sparse (required for the meg installation).

Methodology

The model and datasets are described in Sanna Passino, F. and Heard, N. A. (2023).

Understanding and running the code

The main part of the code is contained in the file meg.py, which contains a Python class for the MEG model and inference using gradient ascent and expectation-maximisation methods.

For the simulation in Section 5.2, the file simulation_erdos.py is used, using the arguments in simulation_erdos.sh in scripts. For fitting the model on the Enron and ICL data, the files enron.py and icl.py are used. Details about the possible options are given by the help function for each file. For example, running python3 scripts/simulation_erdos.py --help returns:

-f: name of the destination folder for the output files,
-m: Boolean variable for the main effects (default: FALSE),
-i: Boolean variable for the interactions (default: FALSE),
-pm: Boolean variable (used only if -m TRUE), if TRUE a Poisson process is fitted for the main effects (default: FALSE),
-pi: Boolean variable (used only if -i TRUE), if TRUE a Poisson process is fitted for the interactions (default: FALSE),
-hm: Boolean variable (used only if -m TRUE), if TRUE a Hawkes process is fitted for the main effects, otherwise a Markov process is used (default: FALSE),
-hi: Boolean variable (used only if -i TRUE), if TRUE a Hawkes process is fitted for the interactions, otherwise a Markov process is used (default: FALSE),
-d: number of latent features for the interaction term (default: 1),
-n: number of nodes of the graph in the simulation (default: 10),
-T: maximum time of observation for each simulated graph (default: 1000000),
-M: number of simulated events for each graph (default: 10000),
-p: probability of a link in the Erdős–Rényi graph (default: 0.5).

For example, the first simulation is obtained running the following command line:

python3 scripts/simulation_erdos.py -f simulation_1 -M 5000 -p 0.25 -n 10 -d 1 -m -i -hm -hi &

Similar commands are used for the application on the Enron and ICL data. Running ./enron.py --help gives two additional options:

-z: Boolean variable, if TRUE for , and if (default: set to its MLE);
-fl: Boolean variable, if TRUE, for all links (default: FALSE).

For example, to obtain the best performing model on the Enron data, the following command line should be run:

python3 scripts/enron.py -m -hm -i -d 5 -z -f 'enron_results/tau_Aij/mi_hm_wi_5' &

Reproducing the results in the paper

Since many of the simulations are computationally expensive to run, the output has been stored in the repository in the directories simulation_main, simulation_inter, simulation_1 and simulation_2 in results. Details on how to obtain such outputs are given in the following paragraphs.

The results, tables and figures in the paper could be reproduced using the following files:

Figures

Figure 1 - Source .tex files to reproduce the figures are in the directory tikz_process.
Figure 2 - It can be reproduced running the following three files in succession:
- simulation_main_effects.sh (WARNING: computationally demanding), which uses simulation_main_effects.py with argument -s SEED, and stores the simulated graphs in .npy files in simulation_main, with name meg_simulate_SEED.npy;
- after simulating the grahs, the parameter estimation procedure is run using estimate_simulation_main_effects.py, which takes as argument -n NUMBEREVENTS the number of events to use for estimation. The output is saved in a directory simulation_main/estimate_NUMBEREVENTS. For reproducing Figure 2, the argument -n 3000 should be used;
- plots are obtained from plots_simulation_main.py, run with -M NUMBEREVENTS corresponding to the number of events used for inference (for Figure 2, -M 3000).
Figure 3 - The procedure is similar to Figure 2:
- simulation_interaction.sh (WARNING: computationally demanding), which uses simulation_interaction.py with argument -s SEED, and stores the simulated graphs in .npy files in simulation_inter, with name meg_simulate_SEED.npy;
- after simulating the grahs, the parameter estimation procedure is run using estimate_simulation_interactions.py, which takes as argument -n NUMBEREVENTS the number of events to use for estimation. The output is saved in a directory simulation_inter/estimate_NUMBEREVENTS. For reproducing Figure 3, the argument -n 3000 should be used;
- plots are obtained from plots_simulation_inter.py, run with -M NUMBEREVENTS corresponding to the number of events used for inference (for Figure 3, -M 3000).
Figure 4 - The plots can be obtained running estimate_simulation_main_effects.py and plots_simulation_main.py multiple times with arguments -n and -M 250, 500, 1000, and 2000.
Figure 5 - The boxplots can be reproduced running ./simulation_erdos.sh, followed by estimate_simulation_erdos.sh (both computationally expensive). The plot is then obtained by running followed by boxplots.py.

Tables

Table 1 - The results can be reproduced running ./enron_calls.sh (running the entire file is not recommended, since the file contains command lines for all the 117 combinations of models in Table 1), which uses the file enron.py. Comparisons with the model of Fox et al. (2016) can be run using the files fox_model.py and fox_enron.py.

Data

The Enron data can be downloaded running scripts/enron_filter.sh;
For security reasons, the ICL network data have not been made available, but the code to run the model on such networks (scripts/icl.py) is available.

tlyim / meg Goto Github PK

meg's Introduction

Mutually exciting point process graphs for modelling dynamic networks

Methodology

Understanding and running the code

Reproducing the results in the paper

Figures

Tables

Data

meg's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent