Coder Social home page Coder Social logo

causy-dev / causy Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 0.0 8.94 MB

Causal discovery made easy.

Home Page: https://causy-dev.github.io/causy/

License: MIT License

Python 99.61% Makefile 0.04% Smarty 0.35%
causality pc-algorithm python pytorch causal-discovery causal-inference

causy's Issues

Pre-knowledge: Allow edges to be protected

Currently our data structure does not support to protect edges from being deleted.

Protecting edges is needed so that we can incorporate pre-knowledge into our graphs.

Therefore we need to

  • add a protected field to our Edge class
  • check before modification or deletion of edges if the deletion of the edge is allowed
  • show the user a warning and add the information into our edge history if we try to remove a protected edge
  • add an option to incorporate pre knowledge #9

Create loops over pipeline steps

Clean up create_pipeline and add the following features:

  • using different generators for each rule
  • iterating over pipeline steps until exit condition

Update config accordingly.

Allow multiple edges of different types between nodes

The output of the FCI algorithm is a MAG with at most one edge between two nodes. However, to properly test the algorithm, it is helpful to test the inducing_path_exists function on ADMGs with possibly two different edge types between two nodes, a directed edge representing a direct effect and a bidirected edge representing a hidden confounder. (In a MAG, there would just be a directed edge in this case.)

Therefore, we should think about whether to implement this option. For now, we exclude tests that would need such an option in order to return the desired results, for example:

    def test_is_path_inducing_multiple_edges(self):
        graph = GraphManager()
        node1 = graph.add_node("test1", [1, 2, 3])
        node2 = graph.add_node("test2", [1, 2, 3])
        node3 = graph.add_node("test3", [1, 2, 3])
        graph.add_bidirected_edge(node1, node2, {"test": "test"})
        graph.add_bidirected_edge(node2, node3, {"test": "test"})
        graph.add_directed_edge(node2, node3, {"test": "test"})
        path = [(node1, node2), (node2, node3)]
        self.assertTrue(graph._is_path_inducing(path, node1, node3))

Implement edge type enum per algorithm

Currently we give our edges implicitly meaning based on context of the algorithms they are used in.

But edges have specific different meanings in different algorithms. One option would be to find a common superset between those algorithms. Another option would be to have one EdgeType enum class per Algorithm.

This could look something like this:

class PCEdgeTypes(EdgeType):
     DIRECTED_EDGE = "directed"
     UNDIRECTED_EDGE = "undirected"

     @pre
     @on_updated([PCEdgeTypes.UNDIRECTED_EDGE], [PCEdgeTypes.DIRECTED_EDGE])
     @classmethod
     def check_update_of_undirected_edge_possible(cls, node_a, node_b, graph, operations):
         pass

This also means that a PipelineStep needs to explicitly tell what kind of edge types it requires. And that the edge type enum object can be configured at the creation of a model.

Assumptions: Test how to best introduce warnings / guides

Example: The RKI data in the project is not i.i.d. (independent and identically distributed) because PLZs that lie close together are highly correlated. Therefore, the results of the PC algorithm can be highly biased.

Test how to best integrate this information. Ideas:

  • Use available tests for assumptions (e.g. if they are i.i.d. or stationary) and through warnings
  • Suggest algorithms that do need the violated assumption if available
  • If no algorithm is available: offer different heuristics to account for the assumption violation but indicate that the results are not reliable anymore. This could intuitively be done by showing outputs of different heuristics and documenting their weaknesses as well as robustness tests whenever possible.

Test that fails in current setup

check why.

 def test_second_toy_model_example(self):
        rdnv = self.seeded_random.normalvariate
        model = IIDSampleGenerator(
            edges=[
                SampleEdge(NodeReference("A"), NodeReference("C"), 1),
                SampleEdge(NodeReference("B"), NodeReference("C"), 2),
                SampleEdge(NodeReference("A"), NodeReference("D"), 3),
                SampleEdge(NodeReference("B"), NodeReference("D"), 1),
                SampleEdge(NodeReference("C"), NodeReference("D"), 1),
                SampleEdge(NodeReference("B"), NodeReference("E"), 4),
                SampleEdge(NodeReference("E"), NodeReference("F"), 5),
                SampleEdge(NodeReference("B"), NodeReference("F"), 6),
                SampleEdge(NodeReference("C"), NodeReference("F"), 1),
                SampleEdge(NodeReference("D"), NodeReference("F"), 1),
            ],
            random=lambda: rdnv(0, 1),
        )

        sample_size = 100000
        test_data, sample_graph = model.generate(sample_size)

        tst = PCStable()
        tst.create_graph_from_data(test_data)
        tst.create_all_possible_edges()
        tst.execute_pipeline_steps()

Move from serialize methods everywhere to a serializer mixin

Currently we hack a serialize method into every graph to allow users to eject and modify them in JSON/(soon YAML) format. But it would be so much cooler if we would just have a generic Mixin which makes every part of our pipeline serializable.

Implement Skeleton Generator Concept

Currently, the graph is initialised with one hard coded skeleton (create_all_possible_edges). This should be configurable such that including prior knowledge becomes easy. Also, when initialising the pre-configured algorithms, you should not have to explicitly initialise the graph anymore.

Fix misleading naming of path and edge types

Currently, we use the following functions to check the following tasks:

directed_edge_exists(v, w): checks if there is a directed edge from node v to node w or a bidirected edge between two nodes v and w
only_directed_edge_exists(v, w): checks if there is a directed edge from node v to node w
directed_path_exists(v, w): checks if a directed path from node v to node w exists, not containing any bidirected edges
path_exists(v, w): checks if a path exists between node v and node w on the underlying undirected graph, ignoring edge types

Think about a better and coherent naming. First ideas:

path_exists -> orientation_agnostic_path_exists
directed_edge_exists -> directed_from_to_or_bidirected_edge_exists
only_directed_edge_exists -> directed_edge_exists
directed_path_exists - ok.

Also add better documentation of the concept of inducing paths.

Fix IID Sample generator bug

It currently generates the data based on the initial value and not based on the current step. Also, we later don't want initial values at all, but will dynamically compute the order such that no variable depends on a variable that has not been assigned a value yet. But for now, it's ok with initial values and it should first work properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.