I think the problem is that you can’t stack multiple variables at once in DataFrames. There is an issue for it here. Until that’s resolved I think it’s not possible.
It also doesn’t do that automatic promotion to numbers, giving 80, 81, and 82, which is a nice feature of Stata.
So yeah. best bet is probably to do two separate stacks and then join them together. Comment on the linked issue to keep track of it, please.
If you find a way that’s more performant (or if you come across other problems), I’d appreciate if you filed an issue on the repo. I’m sure there’s lots of low-hanging fruit.
Posting an attempt of a more general function, though it’s much slower than Douglass:
function reshape_long(df, id, cons, stubs)
stub_len = length(stubs);
df_vector = [DataFrame() for _ in 1:stub_len];
iter = 1
for i in 1:length(df_vector)
if iter ==1
df_vector[i] = select(df, [id, cons], Regex(stubs[i]))
iter += 1
else
df_vector[i] = select(df, [id], Regex(stubs[i]))
end
end
df_vector2 = [DataFrame() for _ in 1:stub_len];
for i in 1:length(df_vector2)
df_vector2[i] = stack(df_vector[i], variable_name = stubs[i], Regex(stubs[i]), value_name = Symbol("value_$i") );
df_vector2[i] = combine(groupby(df_vector2[i], id), sdf -> sort(sdf), id => eachindex => :index)
end
res = [DataFrame() for _ in 1:(stub_len-1)];
iter = 1
for i in 1:length(res)
if iter ==1
res[i] = outerjoin(df_vector2[i],df_vector2[i+1], on = [id,:index] )
iter +=1
else
res[i] = outerjoin(res[1], df_vector2[i+1], on = [id, :index])
end
end
d = res[end]
symbs = Symbol(stubs[1])
d[!, symbs] = getproperty.(match.(r"[0-9]+",d[!, symbs]), :match)
rename!(d,symbs => :year)
select!(d, Not([ Symbol(stubs[i]) for i in 2:length(stubs) ]))
rename!(d, [(Symbol("value_$i")=>Symbol(stubs[i])) for i in 1:length(stubs)])
select!(d, Not(:index, ))
return(d)
end