A very vanilla (at the moment) Julia package for causal inference, graphical models and structure learning with the PC algorithm. The package contains for now the classical PC algorithm and some related functionality.
The algorithms use the Julia package LightGraphs. Graphs are represented by sorted adjacency lists (vectors in the implemention). CPDAGs are just DiGraphs where unoriented edges are represented by both a forward and a backward directed edge.
D. M. Chickering: Learning Equivalence Classes of Bayesian-Network Structures. Journal of Machine Learning Research 2 (2002), 445-498.
D. Colombo, M. H. Maathuis: Order-Independent Constraint-Based Causal Structure Learning. Journal of Machine Learning Research 15 (2014), 3921-3962.
Just in time for JuliaCon we have added with the help of @RobertGregg the Parallel Greedy Equivalence Search (GES) as score based alternative to the PC algorithm!
Marcel Wienöbst at the same time added an extensive suit of adjustment set search functions I believe only matched by Dagitty in functionality (but perhaps not in performance).
Finally, CausalInference now uses Threads now at two crucial steps.
# Generate some sample data to use with the GES algorithm
N = 2000 # number of data points
# define simple linear model with added noise
x = randn(N)
v = x + randn(N)*0.25
w = x + randn(N)*0.25
z = v + w + randn(N)*0.25
s = z + randn(N)*0.25
df = (x=x, v=v, w=w, z=z, s=s)
With this data ready, we can now see to what extent we can back out the underlying causal structure from the data using the GES algorithm. Under the hood, GES uses a score to determine the causal relationships between different variables in a given data set. By default, ges uses a Gaussian BIC to score different causal models.
est_g, score = ges(df; penalty=1.0, parallel=true)
tp = plot_pc_graph_tikz(est_g, [String(k) for k in keys(df)])
We can conclude from observational data that v and w are causes of z which causes s, but aren’t so sure about the relationship between x, v respective x and w.
Adjustment set search
The causal model we are going to study can be represented using the following DAG concerning a set of variables numbered 1 to 8 :
We are interested in the average causal effect (ACE) of a treatment (variable nr. 6) on an outcome (variable nr. 8). Variables nr. 1 and nr. 2 are unobserved.
Ordinary regression will fail to measure the effect because of the presence of a confounder (variable nr. 4). First intuition is to control for the confounder but it is not so straight forward here because of the presence of variables 1 and 2. But the new function list_covariate_adjustments tells what to do:
Zs = list_covariate_adjustment(dag, 6, 8, Int, setdiff(Set(1:8), [1, 2]))
# here exclude variables nr. 1 and nr. 2 because they are unobserved.
lists possible adjustment sets,
Set([5, 4, 3])
tells us to control either for variables 4 and 3, or 5 and 4 etc. With this control variables in the regression we are able to measure the causal effect.
With that, the performance of CausalInference.jl is fast and compares with that of the C implementation in the R package pcalg. As causal model discovery is not an NP -hard problem if the Causal graph is not sparse.
PS: Also, can someone with rights edit the thread title to ANN:CausalInference.jl - Causal Inference in Julia so it becomes searchable?
Although (as the author of https://github.com/nilshg/SynthControl.jl and someone who works mostly in Rubin/Imbens causal inference) I gripe about the very general name of the package these are some cool updates!