causy-dev / causy Goto Github PK
View Code? Open in Web Editor NEWCausal discovery made easy.
Home Page: https://causy-dev.github.io/causy/
License: MIT License
Causal discovery made easy.
Home Page: https://causy-dev.github.io/causy/
License: MIT License
Currently our data structure does not support to protect edges from being deleted.
Protecting edges is needed so that we can incorporate pre-knowledge into our graphs.
Therefore we need to
protected
field to our Edge classClean up create_pipeline and add the following features:
Update config accordingly.
The output of the FCI algorithm is a MAG with at most one edge between two nodes. However, to properly test the algorithm, it is helpful to test the inducing_path_exists function on ADMGs with possibly two different edge types between two nodes, a directed edge representing a direct effect and a bidirected edge representing a hidden confounder. (In a MAG, there would just be a directed edge in this case.)
Therefore, we should think about whether to implement this option. For now, we exclude tests that would need such an option in order to return the desired results, for example:
def test_is_path_inducing_multiple_edges(self):
graph = GraphManager()
node1 = graph.add_node("test1", [1, 2, 3])
node2 = graph.add_node("test2", [1, 2, 3])
node3 = graph.add_node("test3", [1, 2, 3])
graph.add_bidirected_edge(node1, node2, {"test": "test"})
graph.add_bidirected_edge(node2, node3, {"test": "test"})
graph.add_directed_edge(node2, node3, {"test": "test"})
path = [(node1, node2), (node2, node3)]
self.assertTrue(graph._is_path_inducing(path, node1, node3))
Currently we give our edges implicitly meaning based on context of the algorithms they are used in.
But edges have specific different meanings in different algorithms. One option would be to find a common superset between those algorithms. Another option would be to have one EdgeType enum class per Algorithm.
This could look something like this:
class PCEdgeTypes(EdgeType):
DIRECTED_EDGE = "directed"
UNDIRECTED_EDGE = "undirected"
@pre
@on_updated([PCEdgeTypes.UNDIRECTED_EDGE], [PCEdgeTypes.DIRECTED_EDGE])
@classmethod
def check_update_of_undirected_edge_possible(cls, node_a, node_b, graph, operations):
pass
This also means that a PipelineStep needs to explicitly tell what kind of edge types it requires. And that the edge type enum object can be configured at the creation of a model.
Use analytic results for mean and variance (such that the process stays stationary) and normal distribution
Example: The RKI data in the project is not i.i.d. (independent and identically distributed) because PLZs that lie close together are highly correlated. Therefore, the results of the PC algorithm can be highly biased.
Test how to best integrate this information. Ideas:
Currently our entire codebase is working based on pytorch - except for this one little function (scipy_stats.t.ppf).
Which exists only in scipy and the entire world is working based on >30 year old C Code so noone needs to implement it again. But we wanna run it on the gpu.
So we should either do a nice fake-implementation like amazon did it in their gluonts project. Or we just implement it properly in pytorch.
check why.
def test_second_toy_model_example(self):
rdnv = self.seeded_random.normalvariate
model = IIDSampleGenerator(
edges=[
SampleEdge(NodeReference("A"), NodeReference("C"), 1),
SampleEdge(NodeReference("B"), NodeReference("C"), 2),
SampleEdge(NodeReference("A"), NodeReference("D"), 3),
SampleEdge(NodeReference("B"), NodeReference("D"), 1),
SampleEdge(NodeReference("C"), NodeReference("D"), 1),
SampleEdge(NodeReference("B"), NodeReference("E"), 4),
SampleEdge(NodeReference("E"), NodeReference("F"), 5),
SampleEdge(NodeReference("B"), NodeReference("F"), 6),
SampleEdge(NodeReference("C"), NodeReference("F"), 1),
SampleEdge(NodeReference("D"), NodeReference("F"), 1),
],
random=lambda: rdnv(0, 1),
)
sample_size = 100000
test_data, sample_graph = model.generate(sample_size)
tst = PCStable()
tst.create_graph_from_data(test_data)
tst.create_all_possible_edges()
tst.execute_pipeline_steps()
Currently we hack a serialize method into every graph to allow users to eject and modify them in JSON/(soon YAML) format. But it would be so much cooler if we would just have a generic Mixin which makes every part of our pipeline serializable.
Implement strategies for interpolation of missing data points on the graph.
Currently, the graph is initialised with one hard coded skeleton (create_all_possible_edges). This should be configurable such that including prior knowledge becomes easy. Also, when initialising the pre-configured algorithms, you should not have to explicitly initialise the graph anymore.
Currently, we use the following functions to check the following tasks:
directed_edge_exists(v, w): checks if there is a directed edge from node v to node w or a bidirected edge between two nodes v and w
only_directed_edge_exists(v, w): checks if there is a directed edge from node v to node w
directed_path_exists(v, w): checks if a directed path from node v to node w exists, not containing any bidirected edges
path_exists(v, w): checks if a path exists between node v and node w on the underlying undirected graph, ignoring edge types
Think about a better and coherent naming. First ideas:
path_exists -> orientation_agnostic_path_exists
directed_edge_exists -> directed_from_to_or_bidirected_edge_exists
only_directed_edge_exists -> directed_edge_exists
directed_path_exists - ok.
Also add better documentation of the concept of inducing paths.
Think of an example that runs into the situations in the pictures and test if the current quadruple orientation rules work as they are supposed to also in bigger examples.
(Pictures taken from https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/CausalInference/2019/Introduction_to_Constraint-Based_Causal_Structure_Learning.pdf)
It currently generates the data based on the initial value and not based on the current step. Also, we later don't want initial values at all, but will dynamically compute the order such that no variable depends on a variable that has not been assigned a value yet. But for now, it's ok with initial values and it should first work properly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.