Synthetic control in julia

Im currently trying to work through a question using synthetic control in r. link for context here: https://economics.mit.edu/files/17847

Ive found that the documentation is lacking and I have a syntax error. Has anyone put together a file to use synthetic control in julia?

1 Like

I have the start of a package here https://github.com/nilshg/SynthControl.jl but I’ve shamefully neglected it over the last few months. It should work for getting a point estimate of the treatment effect out, but inference isn’t really operational (you can do permutation of placebos but have to process the results of the permutation yourself). Also it currently uses the pretreatment outcomes to find weights only, rather than Alfonso covariates like the originell Abadie et al paper.

Finally, the lady change I made was to spin out construction of the data object to go the estimation on into a separate TreatmentPanel type which I’m planning to use for other casual inference packages as well, and the readme doesn’t reflect this, so check the test file to see how you fit and plot the model.

First, thank you for putting this together.

but I’m having some trouble implementing it. I found the the test file and managed to get your example to work. However my code has an error that i dont understand. Could you help me?

julia> # format data so its easier to compare to example
       formatdata  = DataFrame(state = mydata.name, year = mydata.year , Y = mydata.Y )
809×3 DataFrame
 Row │ state    year        Y         
     │ String   Date        Float64   
─────┼────────────────────────────────
   1 │ alabama  2000-01-01  5.27832
   2 │ alabama  2001-01-01  7.63268
   3 │ alabama  2002-01-01  6.51773
   4 │ alabama  2003-01-01  6.55047
   5 │ alabama  2004-01-01  5.58409
   6 │ alabama  2005-01-01  7.15567
   7 │ alabama  2006-01-01  7.73388
   8 │ alabama  2007-01-01  8.3889
   9 │ alabama  2008-01-01  7.24852
  10 │ alabama  2009-01-01  6.68357
  11 │ alabama  2010-01-01  3.99604
  12 │ alabama  2012-01-01  0.041532
  13 │ alabama  2013-01-01  0.0414039
  14 │ alabama  2014-01-01  0.0206506
  ⋮  │    ⋮         ⋮           ⋮
 797 │ wyoming  2005-01-01  3.30638
 798 │ wyoming  2006-01-01  1.72194
 799 │ wyoming  2007-01-01  3.36527
 800 │ wyoming  2008-01-01  2.01449
 801 │ wyoming  2009-01-01  2.14343
 802 │ wyoming  2010-01-01  1.41938
 803 │ wyoming  2011-01-01  3.52594
 804 │ wyoming  2012-01-01  2.95001
 805 │ wyoming  2013-01-01  2.92035
 806 │ wyoming  2014-01-01  2.40324
 807 │ wyoming  2015-01-01  3.07341
 808 │ wyoming  2016-01-01  3.9364
 809 │ wyoming  2017-01-01  2.41824

julia> # this is what i cant get to work
       tp = TreatmentPanel("colorado" => Date(2014), formatdata; outcome = :Y, id_var = :state, t_var = :year)
ERROR: DimensionMismatch("new dimensions (44, 14) must be consistent with array size 615")
Stacktrace:
 [1] (::Base.var"#throw_dmrsa#213")(::Tuple{Int64,Int64}, ::Int64) at ./reshapedarray.jl:41
 [2] reshape at ./reshapedarray.jl:45 [inlined]
 [3] reshape(::Array{Float64,1}, ::Int64, ::Int64) at ./reshapedarray.jl:116
 [4] TreatmentPanel(::Pair{String,Date}, ::DataFrame; outcome::Symbol, id_var::Symbol, t_var::Symbol, predictors::Nothing) at /Users/modelt/.julia/packages/SynthControl/nHELX/src/treatmentpanel.jl:117
 [5] top-level scope at REPL[10]:2

I wasnt sure if i needed to tag you in my reply

FWIW, I’ve never been able to use Synthetic Control in R, also encountering indecipherable errors, but I’ve found Stata’s package to work well, if you have access to Stata.

I do have access, i suppose it will be my last resort.

Can you share the data? I’d be happy to really a look tomorrow -the error suggests something is going wrong in forming the matrix of pretreatment outcomes, which uses reshape.

Off the top of my head, is your panel balanced? I haven’t looked at the code on a while but I think it currently assumes a balanced panel, mainly because I didn’t know how to treat missing observations for some i/t combinations in the minimization step. So one thing you can try is restricting your data to have the same number of pretreatment periods for all observations.

Here is a link to my data. GitHub - efhart4/Syntheticdata

If my treatment period is 2014, should i restrict data from 2010 to 2017 to balance the dataset?

Sorry I wasn’t exactly clear maybe - by balanced panel I mean that each unit of observation is observed in the same time periods. From your data I see that Alabama is missing one observation (all other states have 18 years in the data, Alabama only 17). The package internally builds a matrix of size (n_control_units, n_pretreatment_periods), which doesn’t work if control units don’t have the same number of pretreatment outcomes.

If you exclude Alabama it works:

julia> tp = TreatmentPanel("colorado" => 2014, df[df.name .!= "alabama", :], outcome = :Y, id_var = :name, t_var = :year)
TreatmentPanel
  data: DataFrame
  outcome: Symbol Y
  id_var: Symbol name
  t_var: Symbol year
  treatment: Pair{Union{String, Symbol}, Union{Int64, Date}}
  y₁₀: Array{Float64}((14,)) [3.09689037539627, 3.41189966665062, 4.03081592176743, 4.19543483694774, 4.7212980596995, 3.82133592176667, 3.72847941805215, 3.35146594369371, 3.27216431173091, 3.3184539222617, 2.88316462512099, 3.26059091669659, 2.98437286099889, 3.52908899034282]
  y₁₁: Array{Float64}((4,)) [3.08341016942311, 3.4665497210528, 4.02460168625396, 4.14893279832875]
  yⱼ₀: Array{Float64}((43, 14)) [7.46039306388848 3.64459424983568 … 1.75225436285053 2.78619300619598; 8.36260402766524 3.72130409380663 … 1.75987869949795 3.20970259891833; … ; 2.58698745311085 1.30826593198556 … 2.60933747015937 2.95000607354192; 3.43348512728692 2.50007197176888 … 3.11916245120324 2.92034501299554]
  yⱼ₁: Array{Float64}((43, 4)) [4.64816508856759 4.63807710133236 2.25612353902884 8.56456115413491; 5.03395284122737 3.80565515825976 2.3988631474649 7.82536527615169; … ; 4.36500816978301 3.03268597015624 1.89306815234655 3.93640144448818; 3.4692846369428 3.33067760256769 6.86182014858432 2.41823765748773]
  x₁₀: Nothing nothing
  xⱼ₀: Nothing nothing
  predictors: Nothing nothing
  J: Int64 43
  T₁: Int64 4
  T₀: Int64 14
  comp_labels: Array{Union{String, Symbol}}((43,))

julia> s = SynthControlModel(tp)

Synthetic Control Model
	Outcome variable: Y
	Time dimension: year with 18 unique values
	Treatment period: 2014
	ID variable: name with 44 unique values
	Treatment ID: colorado

	Model is not fitted

julia> SynthControl.fit!(s)

Synthetic Control Model
	Outcome variable: Y
	Time dimension: year with 18 unique values
	Treatment period: 2014
	ID variable: name with 44 unique values
	Treatment ID: colorado
	Model is fitted
	Impact estimates: [-2.211, -2.506, -0.674, -0.451]


julia> using Plots

julia> plot(s)

produces:

image

Now just based on gut feel that looks like a slightly surprising result, the jaggedness of the pre-treatment fit suggests to me that this might not survive inference if I got it to work :slight_smile:

I also realize that the weigthplot that I thought I’d implemented seems not to work properly, but you can do

julia> weightdf = DataFrame(State = s.treatment_panel.comp_labels, Weight = s.w);

julia> bar(weightdf[weightdf.Weight .> 0.01, :State], weightdf[weightdf.Weight .> 0.01, :Weight], label = "Weight", xrot = 45)

to get

image

So the weights aren’t exactly sparse, which can be a problem if one fits purely based on pre-trends I believe. I really should get covariates to work to help with this, but I probably won’t have time over the next weeks unfortunately…

2 Likes

thank you

I wrote it up for r here

https://stackoverflow.com/questions/67627348/synthetic-control-using-librarysynth-error-your-panel-as-described-by-unit-v/67658264#67658264

Hi @Evan, I wrote up Synthetic Control code in Julia to study the impact of foreign buyer taxes (15% in 2016, 20% in 2018) on house price growth in Vancouver.

I use Elastic Net (as argued in Doudchenko & Imbens), but you can easily modify my code to use your favorite ML model or ensemble…

I just created a Github repo w/ data & parsimonious code:
https://github.com/azev77/Synthetic_Control_in_Julia

Goodluck!

1 Like

Below are the four figures produced by my code.

  1. Year over Year, house price growth in Vancouver:
    image
  2. Observed house price growth in Vancouver & counterfactual house price growth in “Synthetic Vancouver” if there was no foreign-buyer tax
    image
  3. Unit placebo tests
    image
  4. The ATT (Average Treatment Effect on the Treated) w/ 95% CIs (using Jackknife+)
    image

PS: I actually have code for at least 8 inference methods for SCM (all in Julia) stay tuned…