Installation

Installing via pip

GPU Support

Quick Start

Command Line Usage

Explanation of Output

Performance

Parameter Description

Simplified Command Line Version

Complete Parameter Configuration

Local_NG_CD

3. Development

Environment Setup

Creating a Virtual Environment

Building Documentation

Method of Invocation

Python

Command Line (see above for parameter details)

I. Introduction [Top]

The causal discovery algorithm toolkit currently includes:

local_ng_cd: see docs/algo/Local_NG_CD.doc for details

Note that:

local_ng_cd is a linear model that does not distinguish between discrete and continuous data, and treats them uniformly as continuous values.

It is recommended to use local_ng_cd to test the performance on the dataset first (it is the fastest and the algorithm is the newest, and the results are asymptotically correct, taking into account unknown confounding factors).

See the following text for detailed usage instructions.

II. Usage [Top]

Installation [Top]

pip Installation [Top]

python3.7 -m pip install causal-discovery

GPU Support [Top]

It is necessary to check the CUDA version manually and install the corresponding version of CuPy. If CuPy is not installed, NumPy will be used for CPU computation by default.

# Check the supported CUDA version
ls /usr/local/ | grep cuda

# Install the corresponding version of CuPy, for example, CUDA 10.0
python3.7 -m poetry add cupy-cuda100

Quick Start [Top]

Command Line Usage [Top]

# Check the parameter instructions
python3.7 -m causal_discovery fast-simul-data --help
python3.7 -m causal_discovery run-local-ng-cd --help

# Example of parameters for generating simulated data
python3.7 -m causal_discovery fast-simul-data --cov-matrix '[[0, 1, 0], [0, 0, 0.5], [1, 0, 0]]' --sample-size 10

# Generate a default simulated data set (the first row represents the column index indicating the variable names, and each row represents a sampling record)
python3.7 -m causal_discovery fast-simul-data

# Call the default simulated data set
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

The last line of the console log is the path where the calculation result is saved. If the 'output' directory is not specified, it defaults to the current directory.

Calculation Results Description [Top]

After calling local_ng_cd with the simulation dataset simul_data.csv, the result is divided into two files:

Trustworthy edges edges_trust.json; trustworthy edges are the paths that directly lead from the cause to the effect (1 hop).
- Three columns, cause, effect, and causal effect strength.
- The larger the causal effect strength, the deeper the direct causal relationship is. Positive and negative values indicate positive and negative effects, respectively.

causal  reason  effect
2       3       0.7705689874891608
1       3       0.5863603810291644
5       1       0.0993025854935757
3       4       0.5015018174923119
3       5       0.7071753114627015
6       5       0.6977965771255858

Composite weight synthesize_effect.json. The composite weight is the sum of all directed edge weights from the cause to the effect. The n-step composite weight can be calculated by computing the nth power of the adjacency matrix B.
- Three columns, cause, effect, and composite causal effect strength (within 5 hops).

causal  reason  effect
2       3       0.7700866938213671
1       3       0.6950546424688089
3       3       0.34082384182310194
5       3       -0.19710467189008646
4       3       0.06902072305646559

Performance [Top]

It is recommended to use the numpy library provided by conda, which includes MKL provided by Inter and greatly improves the speed of matrix operations (about 50 times faster in the inverse function)

Performance comparison of numpy, cupy, and torch for inverting a 500 x 500 random matrix

Function	mean	std
numpy.linalg.inv	71.8 ms	± 64.9 ms
cupy.linalg.inv	1.39 ms	± 41.5 µs
torch.inverse	6.02 ms	± 6.26 µs

Parameter Description [Top]

Simplified Command Line Version [Top]

Usage: __main__.py [OPTIONS] INPUT_FILE TARGET
                   DATA_TYPE:[triple|matrix|matrixT]

  [Causal Discovery Algorithm: Local-NG-CD, Author: Kun Zhang, Year: 2020]
  
Args:
    input_file (str): [Input file address in csv format]
    target (str): [Name of the target variable]
    data_type (DataType): [Data type: triple (triplet [sample index, variable name, value]), 
                           matrix (matrix, row index as variable name, column index as sample index),
                           matrixT (matrix, row index as sample index, column index as variable name)]
    sep (str, optional): [Csv delimiter]. Defaults to ",".
    index_col (str, optional): [Index index for reading csv]. Defaults to None.
    header (str, optional): [Header index for reading csv]. Defaults to None.
    output_dir (str, optional): [Output directory]. Defaults to "./output".
    log_root (str, optional): [Log directory]. Defaults to "./logs".
    verbose (bool, optional): [Whether to print logs to the console]. Defaults to True.
    candidate_two_step (bool, optional): [Whether to enable 2-step relationship filtering]. Defaults to False.

Raises:
    DataTypeError: [Data type error]

Arguments:
  INPUT_FILE                      [required]
  TARGET                          [required]
  DATA_TYPE:[triple|matrix|matrixT]
                                  [required]

Options:
  --sep TEXT                      [default: ,]
  --index-col TEXT
  --header INTEGER
  --output-dir TEXT               [default: ./output]
  --log-root TEXT                 [default: ./logs]
  --verbose / --no-verbose        [default: True]
  --candidate-two-step / --no-candidate-two-step
                                  [default: False]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Complete Version of Parameter Configuration [Top]

Local_NG_CD [Top]

# Importing method
from causal_discovery.parameter.algo import LocalNgCdParam

# Parameter Details
target_index: int = Field(0, ge=0)               # Target variable index, default 0, unless necessary, no need to modify
candidate_two_step: bool = True                  # Whether to use the 2-step correlation filtering to obtain more variables. If True, the 2-step correlation is used to filter more variables.
alpha: float = Field(5e-2, ge=0, le=1)           # p-value used in correlation filtering. The smaller the value, the more stringent. Generally, 0.05 or 0.01 is used to represent 95% or 99% confidence level
mb_beta_threshold: float = Field(5e-2, ge=0)     # A threshold used to determine whether the edge is undirected when obtaining factor weights using ALasso regression. The larger the value, the more stringent.
ica_regu: float = Field(1e-3, gt=0)              # A penalty term used to constrain the sparsity when using ICA. The smaller the value, the sparser the resulting graph.
b_orig_trust_value: float = Field(5e-2, gt=0)    # A weight threshold used for further filtering after obtaining the adjacency matrix B. The default value is 0.05, and the larger the value, the more stringent.

III. Development [Top]

Environment Setup [Top]

Creating a Virtual Environment [Top]

# python version: >=3.7
cd $PROJECT_DIR
python3.7 -m pip install -U pip setuptools
python3.7 -m pip install poetry
python3.7 -m poetry install

Building the Documentation

[Top]

poetry install --extra doc
invoke doc

Calling Method

[Top]

python [Top]

# Algorithm Main Function
from causal_discovery.algorithm import local_ng_cd, fges_mb, mab_lingam  

# Parameter Class
from causal_discovery.parameter.algo import LocalNgCdParam, FgesMbParam, MabLingamParam

Command Line (See above for details on the parameters)[Top]

# Viewing Parameter Descriptions
python3.7 -m causal_discovery run-local-ng-cd --help

# Calling Example
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

dario-github / causal-discovery Goto Github PK

causal-discovery's Introduction

causal-discovery's People

Contributors

Watchers

causal-discovery's Issues

Recommend Projects

Recommend Topics

Recommend Org