Coder Social home page Coder Social logo

causy-dev / causy Goto Github PK

View Code? Open in Web Editor NEW
17.0 3.0 0.0 8.99 MB

Causal discovery made easy.

Home Page: https://causy-dev.github.io/causy/

License: MIT License

Python 99.93% Makefile 0.07%
causality pc-algorithm python pytorch causal-discovery causal-inference

causy's Introduction

Warning

causy is currently in a very early and experimental stage of development. It currently only supports one algorithm. We do not recommend using it in production.

causy

Causal discovery made easy. Causy allows you to use and implement causal discovery algorithms with easy to use, extend and maintain pipelines. It is built based on pytorch which allows you to run the algorithms on CPUs as well as GPUs seamlessly.

Background

Current causal discovery algorithms are often designed for the primary purpose of research. They are often implemented in a monolithic way, which makes it hard to understand and extend them. Causy aims to solve this problem by providing a framework which allows you to easily implement and use causal discovery algorithms by splitting them up into smaller logic steps which can be stacked together to form a pipeline. This allows you to easily understand, extend, optimize, and experiment with the algorithms.

By shipping causy with sensitively configured default pipelines, we also aim to provide a tool that can be used by non-experts to get started with causal discovery.

Thanks to the pytorch backend, causy is remarkably faster compared to serial CPU based implementations.

In the future, causy aims to provide interactive visualizations which allow you to understand the causal discovery process.

Installation

Currently we only support python 3.11. To install causy run

pip install causy

Usage

Causy can be used via CLI or via code.

Usage via CLI

Run causy with one of the default algorithms

causy execute --help
causy execute your_data.json --algorithm PC --output-file output.json

The input data should be a json file with a list of dictionaries. Each dictionary represents a data point. The keys of the dictionary are the variable names and the values are the values of the variables. The values can be either numeric or categorical.

[
    {"a": 1, "b": 0.3},
    {"a": 0.5, "b": 0.2}
]

You can customize your causy pipeline by ejecting and modifying the pipeline file.

causy eject PC pc.json
# edit pc.json
causy execute your_data.json --pipeline pc.json  --output-file output.json

This might be useful if you want to configure a custom algorithm or if you want to customize the pipeline of a default algorithm.

Causy UI (experimental)

To visualize the causal discovery process, we are currently working on a web-based UI. The UI is currently in a very early stage of development and is not yet ready for use. If you want to try it out, you can run the following command:

causy ui output.json

This runs a web server on port 8000 which allows you to visualize the causal discovery process. The UI is currently read-only and does not allow you to interact with the causal discovery process.

Usage via Code

Use a default algorithm

from causy.algorithms import PC
from causy.graph_utils import retrieve_edges

model = PC()
model.create_graph_from_data(
    [
        {"a": 1, "b": 0.3},
        {"a": 0.5, "b": 0.2}
    ]
)
model.create_all_possible_edges()
model.execute_pipeline_steps()
edges = retrieve_edges(model.graph)

for edge in edges:
    print(
        f"{edge[0].name} -> {edge[1].name}: {model.graph.edges[edge[0]][edge[1]]}"
    )

Use a custom algorithm

from causy.common_pipeline_steps.exit_conditions import ExitOnNoActions
from causy.graph_model import graph_model_factory
from causy.common_pipeline_steps.logic import Loop
from causy.common_pipeline_steps.calculation import CalculatePearsonCorrelations
from causy.independence_tests.common import (
  CorrelationCoefficientTest,
  PartialCorrelationTest,
  ExtendedPartialCorrelationTestMatrix,
)
from causy.orientation_rules.pc import (
  ColliderTest,
  NonColliderTest,
  FurtherOrientTripleTest,
  OrientQuadrupleTest,
  FurtherOrientQuadrupleTest,
)
from causy.graph_utils import retrieve_edges

CustomPC = graph_model_factory(
  pipeline_steps=[
    CalculatePearsonCorrelations(),
    CorrelationCoefficientTest(threshold=0.1),
    PartialCorrelationTest(threshold=0.01),
    ExtendedPartialCorrelationTestMatrix(threshold=0.01),
    ColliderTest(),
    Loop(
      pipeline_steps=[
        NonColliderTest(),
        FurtherOrientTripleTest(),
        OrientQuadrupleTest(),
        FurtherOrientQuadrupleTest(),
      ],
      exit_condition=ExitOnNoActions(),
    ),
  ]
)

model = CustomPC()

model.create_graph_from_data(
  [
    {"a": 1, "b": 0.3},
    {"a": 0.5, "b": 0.2}
  ]
)
model.create_all_possible_edges()
model.execute_pipeline_steps()
edges = retrieve_edges(model.graph)

for edge in edges:
  print(
    f"{edge[0].name} -> {edge[1].name}: {model.graph.edges[edge[0]][edge[1]]}"
  )

Supported algorithms

Currently causy supports the following algorithms:

  • PC (Peter-Clark)
    • PC - the original PC algorithm without any modifications causy.algorithms.PC
    • ParallelPC - a parallelized version of the PC algorithm causy.algorithms.ParallelPC
    • PCStable - a stable version of the PC algorithm causy.algorithms.PCStable

Supported pipeline steps

Detailed information about the pipeline steps can be found in the API Documentation.

Dev usage

Setup

We recommend using poetry to manage the dependencies. To install poetry follow the instructions on https://python-poetry.org/docs/#installation.

Install dependencies

poetry install

Execute tests

poetry run python -m unittest

Funded by Prototype Fund from March 2024 until September 2024

pf_funding_logos

causy's People

Contributors

lilithwittmann avatar this-is-sofia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

causy's Issues

Implement edge type enum per algorithm

Currently we give our edges implicitly meaning based on context of the algorithms they are used in.

But edges have specific different meanings in different algorithms. One option would be to find a common superset between those algorithms. Another option would be to have one EdgeType enum class per Algorithm.

This could look something like this:

class PCEdgeTypes(EdgeType):
     DIRECTED_EDGE = "directed"
     UNDIRECTED_EDGE = "undirected"

     @pre
     @on_updated([PCEdgeTypes.UNDIRECTED_EDGE], [PCEdgeTypes.DIRECTED_EDGE])
     @classmethod
     def check_update_of_undirected_edge_possible(cls, node_a, node_b, graph, operations):
         pass

This also means that a PipelineStep needs to explicitly tell what kind of edge types it requires. And that the edge type enum object can be configured at the creation of a model.

Assumptions: Test how to best introduce warnings / guides

Example: The RKI data in the project is not i.i.d. (independent and identically distributed) because PLZs that lie close together are highly correlated. Therefore, the results of the PC algorithm can be highly biased.

Test how to best integrate this information. Ideas:

  • Use available tests for assumptions (e.g. if they are i.i.d. or stationary) and through warnings
  • Suggest algorithms that do need the violated assumption if available
  • If no algorithm is available: offer different heuristics to account for the assumption violation but indicate that the results are not reliable anymore. This could intuitively be done by showing outputs of different heuristics and documenting their weaknesses as well as robustness tests whenever possible.

Implement Skeleton Generator Concept

Currently, the graph is initialised with one hard coded skeleton (create_all_possible_edges). This should be configurable such that including prior knowledge becomes easy. Also, when initialising the pre-configured algorithms, you should not have to explicitly initialise the graph anymore.

Pre-knowledge: Allow edges to be protected

Currently our data structure does not support to protect edges from being deleted.

Protecting edges is needed so that we can incorporate pre-knowledge into our graphs.

Therefore we need to

  • add a protected field to our Edge class
  • check before modification or deletion of edges if the deletion of the edge is allowed
  • show the user a warning and add the information into our edge history if we try to remove a protected edge
  • add an option to incorporate pre knowledge #9

Move from serialize methods everywhere to a serializer mixin

Currently we hack a serialize method into every graph to allow users to eject and modify them in JSON/(soon YAML) format. But it would be so much cooler if we would just have a generic Mixin which makes every part of our pipeline serializable.

Create loops over pipeline steps

Clean up create_pipeline and add the following features:

  • using different generators for each rule
  • iterating over pipeline steps until exit condition

Update config accordingly.

Test that fails in current setup

check why.

 def test_second_toy_model_example(self):
        rdnv = self.seeded_random.normalvariate
        model = IIDSampleGenerator(
            edges=[
                SampleEdge(NodeReference("A"), NodeReference("C"), 1),
                SampleEdge(NodeReference("B"), NodeReference("C"), 2),
                SampleEdge(NodeReference("A"), NodeReference("D"), 3),
                SampleEdge(NodeReference("B"), NodeReference("D"), 1),
                SampleEdge(NodeReference("C"), NodeReference("D"), 1),
                SampleEdge(NodeReference("B"), NodeReference("E"), 4),
                SampleEdge(NodeReference("E"), NodeReference("F"), 5),
                SampleEdge(NodeReference("B"), NodeReference("F"), 6),
                SampleEdge(NodeReference("C"), NodeReference("F"), 1),
                SampleEdge(NodeReference("D"), NodeReference("F"), 1),
            ],
            random=lambda: rdnv(0, 1),
        )

        sample_size = 100000
        test_data, sample_graph = model.generate(sample_size)

        tst = PCStable()
        tst.create_graph_from_data(test_data)
        tst.create_all_possible_edges()
        tst.execute_pipeline_steps()

Fix IID Sample generator bug

It currently generates the data based on the initial value and not based on the current step. Also, we later don't want initial values at all, but will dynamically compute the order such that no variable depends on a variable that has not been assigned a value yet. But for now, it's ok with initial values and it should first work properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.