I was looking for a straightforward way to do common manipulations in panel data within Julia. For instance, let’s say I have a panel of many countries and years. I would like to calculate the growth rate for a variable country by country. What is the most convenient way to do this? Is there a way to do this using packages such as Query.jl?
I just found something that seems pretty close in R: A new package for panel data analysis in R | R-bloggers
Would be great if Julia offered something similar!
Thanks a lot!
Is there any reason you would rather chose
panelr over using
dplyr directly? For Julia, I would look into DataFramesMeta.
At the moment I mainly need a lead/lag operator that respects the panel structure, i.e., applies the lag operator for each id in the panel separately, generates NA where necessary, and puts everything back together. If that is possible with DataFramesMeta directly, that would be great! I guess that is also the added convenience of panelr over deplyr.
julia> using DataFrames, ShiftedArrays
julia> df = DataFrame(stock = ["a", "a", "a", "b", "b", "b"], time = repeat(1:3, 2), price = rand(100:120, 6))
Row │ stock time price
│ String Int64 Int64
1 │ a 1 108
2 │ a 2 105
3 │ a 3 107
4 │ b 1 101
5 │ b 2 106
6 │ b 3 109
julia> transform!(groupby(df, :stock), :price => (x -> log.(x) .- log.(lag(x))) => :return)
Row │ stock time price return
│ String Int64 Int64 Float64?
1 │ a 1 108 missing
2 │ a 2 105 -0.0281709
3 │ a 3 107 0.0188685
4 │ b 1 101 missing
5 │ b 2 106 0.0483186
6 │ b 3 109 0.0279088
Thanks, that worked
Just one last question: This method does not work when there are gaps in the variable time, correct?
Well, it assumes that subsequent rows represent one time step. If you have missing observations which aren’t in the data (i.e. not represented by
missing but absent altogether) you can construct the range of time steps first (something like
minimum(df.date):Day(1):maximum(df.date)) and then
leftjoin your data onto that, which will generate the missing observations. Any return (in my example) where one of the two days used to compute it is missing will then be missing.