Sorry I wasn’t exactly clear maybe - by balanced panel I mean that each unit of observation is observed in the same time periods. From your data I see that Alabama is missing one observation (all other states have 18 years in the data, Alabama only 17). The package internally builds a matrix of size (n_control_units, n_pretreatment_periods)
, which doesn’t work if control units don’t have the same number of pretreatment outcomes.
If you exclude Alabama it works:
julia> tp = TreatmentPanel("colorado" => 2014, df[df.name .!= "alabama", :], outcome = :Y, id_var = :name, t_var = :year)
TreatmentPanel
data: DataFrame
outcome: Symbol Y
id_var: Symbol name
t_var: Symbol year
treatment: Pair{Union{String, Symbol}, Union{Int64, Date}}
y₁₀: Array{Float64}((14,)) [3.09689037539627, 3.41189966665062, 4.03081592176743, 4.19543483694774, 4.7212980596995, 3.82133592176667, 3.72847941805215, 3.35146594369371, 3.27216431173091, 3.3184539222617, 2.88316462512099, 3.26059091669659, 2.98437286099889, 3.52908899034282]
y₁₁: Array{Float64}((4,)) [3.08341016942311, 3.4665497210528, 4.02460168625396, 4.14893279832875]
yⱼ₀: Array{Float64}((43, 14)) [7.46039306388848 3.64459424983568 … 1.75225436285053 2.78619300619598; 8.36260402766524 3.72130409380663 … 1.75987869949795 3.20970259891833; … ; 2.58698745311085 1.30826593198556 … 2.60933747015937 2.95000607354192; 3.43348512728692 2.50007197176888 … 3.11916245120324 2.92034501299554]
yⱼ₁: Array{Float64}((43, 4)) [4.64816508856759 4.63807710133236 2.25612353902884 8.56456115413491; 5.03395284122737 3.80565515825976 2.3988631474649 7.82536527615169; … ; 4.36500816978301 3.03268597015624 1.89306815234655 3.93640144448818; 3.4692846369428 3.33067760256769 6.86182014858432 2.41823765748773]
x₁₀: Nothing nothing
xⱼ₀: Nothing nothing
predictors: Nothing nothing
J: Int64 43
T₁: Int64 4
T₀: Int64 14
comp_labels: Array{Union{String, Symbol}}((43,))
julia> s = SynthControlModel(tp)
Synthetic Control Model
Outcome variable: Y
Time dimension: year with 18 unique values
Treatment period: 2014
ID variable: name with 44 unique values
Treatment ID: colorado
Model is not fitted
julia> SynthControl.fit!(s)
Synthetic Control Model
Outcome variable: Y
Time dimension: year with 18 unique values
Treatment period: 2014
ID variable: name with 44 unique values
Treatment ID: colorado
Model is fitted
Impact estimates: [-2.211, -2.506, -0.674, -0.451]
julia> using Plots
julia> plot(s)
produces:
Now just based on gut feel that looks like a slightly surprising result, the jaggedness of the pre-treatment fit suggests to me that this might not survive inference if I got it to work
I also realize that the weigthplot that I thought I’d implemented seems not to work properly, but you can do
julia> weightdf = DataFrame(State = s.treatment_panel.comp_labels, Weight = s.w);
julia> bar(weightdf[weightdf.Weight .> 0.01, :State], weightdf[weightdf.Weight .> 0.01, :Weight], label = "Weight", xrot = 45)
to get
So the weights aren’t exactly sparse, which can be a problem if one fits purely based on pre-trends I believe. I really should get covariates to work to help with this, but I probably won’t have time over the next weeks unfortunately…