Coder Social home page Coder Social logo

zhy5186612 / ils-csl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tymadara/ils-csl

0.0 0.0 0.0 9.06 MB

Supplementary materials and codes for the paper "Causal Structure Learning Supervised by Large Language Model"

Shell 0.18% Python 54.22% Jupyter Notebook 45.60%

ils-csl's Introduction

Causal Structure Learning Supervised by Large Language Model

Welcome to the repository of code and materials of the research on Causal Structure Learning Supervised by Large Language Model.

Installation

Follow these steps to install and set up the necessary environments for running the project:

  1. Download repository: Download all the necessary files to your local machine.

  2. Install Graphviz: If not installed, run the following command:

    sudo apt-get install graphviz libgraphviz-dev gcc
  3. Navigate to the project folder:

    cd ./ILS-CSL
  4. Install the required packages: Run the installation script with the following command:

    sh install.sh
  5. Run the main code: Execute the primary script by running:

    python main.py

Project Structure

Here's an explanation of key directories and files in this repository:

  • suppl/: Supplementary results of the experiments in the paper.
  • BN-structure/: Contains the true Bayesian networks used in the experiments.
  • data/: Holds the utilized data in txt and csv formats, along with the score files, note that due to the limitation of upload size, we only keep data of Asia and Child for test. For other datasets, you can access the data provided in an open-source repository in https://github.com/andrewli77/MINOBS-anc/tree/master/data/csv, and use calculate_score.py to generate the local scores.
  • out/: Stores the output and statistics from the experiments. The key results are illustrated as follows:
    • adj-matrix/: Contains the adjacency matrices of the learned structures with every iteration.
    • gpt_history/: Contains the history of GPT queries on all pairwise variables in the experiment.
    • results-all.csv: Contains all the related statistics of the experiment, including time, extra, missing, reverse, shd, norm_shd, nnz, fdr, tpr, fpr, precision, recall, F1, gscore, etc.
    • llm_extimation.json: Details of parameter estimation in Table 6 and Table 7.
    • prior-iter: Detailed data of prior constraints in the iterative process, as reported in Figure 6.
  • img/: Contains the visual results generated from the experiments.
  • pairwise_samples/: Includes the sampled pairs of nodes used for study.
  • perform/: Contains the implemented or utilized prior constraint-based algorithms.
  • hc/: Holds the extended HC search algorithm for prior constraints.
  • prompt/: Contains meta materials used to generate prompts for LLMs.
  • minobsx/: Contains extended MINOBSx search algorithm for soft constraints.
  • BI-CaMML/: Contains CaMML; refer to install.sh for installation details. Requires a JRE environment for execution.

Code Overview

  • main.py: The main script for executing the ILS-CSL experiment.
  • config.py: Manages the project configuration settings.
  • evaluation_DAG.py: Responsible for the evaluation of Directed Acyclic Graphs (DAGs).
  • GPT.py: Provides access to the GPT model.
  • statics_plot.py: Includes scripts for generating experiment results and plotting figures.
  • graph_vis.py: Used for visualizing the topological graph.
  • graph_sampler.py: Samples pairs from the graph for study.
  • utils.py: Contains utility functions for the project.
  • calculate_score.py: Holds code for calculating scores such as BIC and BDeu. Not necessary once the score is calculated.
  • para_estimate.ipynb: Estimate the parameters related to LLM inference and CSL structures in Section 4.2 and Appendix A.4.

Running the Core Experiment

Quick Start

To run the core ILS-CSL code, execute the following command:

python main.py --dataset=asia --alg=HC --score=bdeu --data_index=1 --datasize_index=0

This command saves the raw output in the out/ folder and figures in the img/ folder. Further process more statistics by running:

python statics_plot.py

Configuration Options

  • dataset: Specify the dataset name (e.g., asia, alarm, child, cancer, insurance, mildew, water, barley).
  • alg: Define the algorithm name (e.g., CaMML, HC, softHC, hardMINOBSx, softMINOBSx).
  • score: Indicate the score name (e.g., bic, bdeu).
  • data_index: Set the index of the dataset (e.g., 1, 2, 3, 4, 5, 6).
  • datasize_index: Set the index of the dataset size (0 for smaller size, 1 for larger size).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.