ANN:CausalInference.jl - Causal Inference in Julia


A very vanilla (at the moment) Julia package for causal inference, graphical models and structure learning with the PC algorithm. The package contains for now the classical PC algorithm and some related functionality.

See the documentation for details and perhaps issue #1 (Roadmap/Contribution) if you are interested.

The algorithms use the Julia package LightGraphs. Graphs are represented by sorted adjacency lists (vectors in the implemention). CPDAGs are just DiGraphs where unoriented edges are represented by both a forward and a backward directed edge.


  • D. M. Chickering: Learning Equivalence Classes of Bayesian-Network Structures. Journal of Machine Learning Research 2 (2002), 445-498.
  • D. Colombo, M. H. Maathuis: Order-Independent Constraint-Based Causal Structure Learning. Journal of Machine Learning Research 15 (2014), 3921-3962.

CausalInference.jl 0.9

Just tagged a new version - in short: the package is alive and kicking. Next plans are perhaps adding the GES algorithm, things for that are in place. GitHub - mschauer/CausalInference.jl: Causal inference, graphical models and structure learning with the PC algorithm.


This looks awesome! I’ve actually implemented a version of FGES which may be useful for this package. Maybe it would make sense to combine the two?

1 Like

I would love to! Integration with RobertGregg/FGES.jl · Issue #77 · mschauer/CausalInference.jl · GitHub

ANN: Causal Inference 0.11.1

Just in time for JuliaCon we have added with the help of @RobertGregg the Parallel Greedy Equivalence Search (GES) as score based alternative to the PC algorithm!

Marcel Wienöbst at the same time added an extensive suit of adjustment set search functions I believe only matched by Dagitty in functionality (but perhaps not in performance).

Finally, CausalInference now uses Threads now at two crucial steps.

GES Example

using CausalInference
using TikzGraphs
using Random

# Generate some sample data to use with the GES algorithm

N = 2000 # number of data points

# define simple linear model with added noise

x = randn(N)
v = x + randn(N)*0.25
w = x + randn(N)*0.25
z = v + w + randn(N)*0.25
s = z + randn(N)*0.25

df = (x=x, v=v, w=w, z=z, s=s)

With this data ready, we can now see to what extent we can back out the underlying causal structure from the data using the GES algorithm. Under the hood, GES uses a score to determine the causal relationships between different variables in a given data set. By default, ges uses a Gaussian BIC to score different causal models.

est_g, score = ges(df; penalty=1.0, parallel=true)
tp = plot_pc_graph_tikz(est_g, [String(k) for k in keys(df)])


We can conclude from observational data that v and w are causes of z which causes s, but aren’t so sure about the relationship between x, v respective x and w.

Adjustment set search

The causal model we are going to study can be represented using the following DAG concerning a set of variables numbered 1 to 8 :


using CausalInference

dag = digraph([1 => 3, 3 => 6, 2 => 5, 5 => 8, 6 => 7, 7 => 8, 1 => 4, 2 => 4, 4 => 6, 4 => 8])

We are interested in the average causal effect (ACE) of a treatment (variable nr. 6) on an outcome (variable nr. 8). Variables nr. 1 and nr. 2 are unobserved.

Ordinary regression will fail to measure the effect because of the presence of a confounder (variable nr. 4). First intuition is to control for the confounder but it is not so straight forward here because of the presence of variables 1 and 2. But the new function list_covariate_adjustments tells what to do:

Zs = list_covariate_adjustment(dag, 6, 8, Int[], setdiff(Set(1:8), [1, 2]))
# here exclude variables nr. 1 and nr. 2 because they are unobserved.

lists possible adjustment sets,

    Set([4, 3])
    Set([5, 4])
    Set([5, 4, 3])

tells us to control either for variables 4 and 3, or 5 and 4 etc. With this control variables in the regression we are able to measure the causal effect.


I have to say, just adding Threads.@parallel in the right place feels like magic. I did this for the PC algorithm and the GES algorithm. GES also uses GitHub - marius311/Memoization.jl: Easily and efficiently memoize any function, closure, or callable object in Julia. with a thread save GitHub - JuliaCollections/LRUCache.jl: An implementation of an LRU Cache in Julia.


With that, the performance of CausalInference.jl is fast and compares with that of the C implementation in the R package pcalg. As causal model discovery is not an NP -hard problem if the Causal graph is not sparse.

PS: Also, can someone with rights edit the thread title to ANN:CausalInference.jl - Causal Inference in Julia so it becomes searchable?


Although (as the author of and someone who works mostly in Rubin/Imbens causal inference) I gripe about the very general name of the package these are some cool updates!