Coder Social home page Coder Social logo

dario-github / causal-discovery Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 445 KB

The causal discovery toolkit, related algorithms are derived from the matlab version, for ease of use, converted to the python version, so that non-professionals can also use it.

License: MIT License

Python 99.54% Shell 0.46%
causal-discovery linear-models non-gaussian toolkit

causal-discovery's Introduction

CI codecov version visitors

English / 简体中文

Index

1. Introduction

2. Usage

Installation

  Installing via pip

  GPU Support

Quick Start

  Command Line Usage

  Explanation of Output

Performance

Parameter Description

  Simplified Command Line Version

  Complete Parameter Configuration

   Local_NG_CD

3. Development

Environment Setup

  Creating a Virtual Environment

Building Documentation

Method of Invocation

  Python

  Command Line (see above for parameter details)

The causal discovery algorithm toolkit currently includes:

  • local_ng_cd: see docs/algo/Local_NG_CD.doc for details

Note that:

  • local_ng_cd is a linear model that does not distinguish between discrete and continuous data, and treats them uniformly as continuous values.

It is recommended to use local_ng_cd to test the performance on the dataset first (it is the fastest and the algorithm is the newest, and the results are asymptotically correct, taking into account unknown confounding factors).

See the following text for detailed usage instructions.

python3.7 -m pip install causal-discovery

It is necessary to check the CUDA version manually and install the corresponding version of CuPy. If CuPy is not installed, NumPy will be used for CPU computation by default.

# Check the supported CUDA version
ls /usr/local/ | grep cuda

# Install the corresponding version of CuPy, for example, CUDA 10.0
python3.7 -m poetry add cupy-cuda100
# Check the parameter instructions
python3.7 -m causal_discovery fast-simul-data --help
python3.7 -m causal_discovery run-local-ng-cd --help

# Example of parameters for generating simulated data
python3.7 -m causal_discovery fast-simul-data --cov-matrix '[[0, 1, 0], [0, 0, 0.5], [1, 0, 0]]' --sample-size 10

# Generate a default simulated data set (the first row represents the column index indicating the variable names, and each row represents a sampling record)
python3.7 -m causal_discovery fast-simul-data

# Call the default simulated data set
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

The last line of the console log is the path where the calculation result is saved. If the 'output' directory is not specified, it defaults to the current directory.

After calling local_ng_cd with the simulation dataset simul_data.csv, the result is divided into two files:

  1. Trustworthy edges edges_trust.json; trustworthy edges are the paths that directly lead from the cause to the effect (1 hop).

    • Three columns, cause, effect, and causal effect strength.

    • The larger the causal effect strength, the deeper the direct causal relationship is. Positive and negative values indicate positive and negative effects, respectively.

causal  reason  effect
2       3       0.7705689874891608
1       3       0.5863603810291644
5       1       0.0993025854935757
3       4       0.5015018174923119
3       5       0.7071753114627015
6       5       0.6977965771255858
  1. Composite weight synthesize_effect.json. The composite weight is the sum of all directed edge weights from the cause to the effect. The n-step composite weight can be calculated by computing the nth power of the adjacency matrix B.

    • Three columns, cause, effect, and composite causal effect strength (within 5 hops).
causal  reason  effect
2       3       0.7700866938213671
1       3       0.6950546424688089
3       3       0.34082384182310194
5       3       -0.19710467189008646
4       3       0.06902072305646559

It is recommended to use the numpy library provided by conda, which includes MKL provided by Inter and greatly improves the speed of matrix operations (about 50 times faster in the inverse function)

Performance comparison of numpy, cupy, and torch for inverting a 500 x 500 random matrix

Function mean std
numpy.linalg.inv 71.8 ms ± 64.9 ms
cupy.linalg.inv 1.39 ms ± 41.5 µs
torch.inverse 6.02 ms ± 6.26 µs
Usage: __main__.py [OPTIONS] INPUT_FILE TARGET
                   DATA_TYPE:[triple|matrix|matrixT]

  [Causal Discovery Algorithm: Local-NG-CD, Author: Kun Zhang, Year: 2020]
  
Args:
    input_file (str): [Input file address in csv format]
    target (str): [Name of the target variable]
    data_type (DataType): [Data type: triple (triplet [sample index, variable name, value]), 
                           matrix (matrix, row index as variable name, column index as sample index),
                           matrixT (matrix, row index as sample index, column index as variable name)]
    sep (str, optional): [Csv delimiter]. Defaults to ",".
    index_col (str, optional): [Index index for reading csv]. Defaults to None.
    header (str, optional): [Header index for reading csv]. Defaults to None.
    output_dir (str, optional): [Output directory]. Defaults to "./output".
    log_root (str, optional): [Log directory]. Defaults to "./logs".
    verbose (bool, optional): [Whether to print logs to the console]. Defaults to True.
    candidate_two_step (bool, optional): [Whether to enable 2-step relationship filtering]. Defaults to False.

Raises:
    DataTypeError: [Data type error]

Arguments:
  INPUT_FILE                      [required]
  TARGET                          [required]
  DATA_TYPE:[triple|matrix|matrixT]
                                  [required]

Options:
  --sep TEXT                      [default: ,]
  --index-col TEXT
  --header INTEGER
  --output-dir TEXT               [default: ./output]
  --log-root TEXT                 [default: ./logs]
  --verbose / --no-verbose        [default: True]
  --candidate-two-step / --no-candidate-two-step
                                  [default: False]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.
# Importing method
from causal_discovery.parameter.algo import LocalNgCdParam

# Parameter Details
target_index: int = Field(0, ge=0)               # Target variable index, default 0, unless necessary, no need to modify
candidate_two_step: bool = True                  # Whether to use the 2-step correlation filtering to obtain more variables. If True, the 2-step correlation is used to filter more variables.
alpha: float = Field(5e-2, ge=0, le=1)           # p-value used in correlation filtering. The smaller the value, the more stringent. Generally, 0.05 or 0.01 is used to represent 95% or 99% confidence level
mb_beta_threshold: float = Field(5e-2, ge=0)     # A threshold used to determine whether the edge is undirected when obtaining factor weights using ALasso regression. The larger the value, the more stringent.
ica_regu: float = Field(1e-3, gt=0)              # A penalty term used to constrain the sparsity when using ICA. The smaller the value, the sparser the resulting graph.
b_orig_trust_value: float = Field(5e-2, gt=0)    # A weight threshold used for further filtering after obtaining the adjacency matrix B. The default value is 0.05, and the larger the value, the more stringent.
# python version: >=3.7
cd $PROJECT_DIR
python3.7 -m pip install -U pip setuptools
python3.7 -m pip install poetry
python3.7 -m poetry install

[Top]

poetry install --extra doc
invoke doc

[Top]

# Algorithm Main Function
from causal_discovery.algorithm import local_ng_cd, fges_mb, mab_lingam  

# Parameter Class
from causal_discovery.parameter.algo import LocalNgCdParam, FgesMbParam, MabLingamParam
# Viewing Parameter Descriptions
python3.7 -m causal_discovery run-local-ng-cd --help

# Calling Example
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

causal-discovery's People

Contributors

dario-github avatar

Watchers

 avatar

causal-discovery's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.