Manipulating Panel Data in Julia

IljaK91 · June 21, 2022, 8:52am

Hey everyone,

I was looking for a straightforward way to do common manipulations in panel data within Julia. For instance, let’s say I have a panel of many countries and years. I would like to calculate the growth rate for a variable country by country. What is the most convenient way to do this? Is there a way to do this using packages such as Query.jl?

I just found something that seems pretty close in R: A new package for panel data analysis in R | R-bloggers

Would be great if Julia offered something similar!

Thanks a lot!

Ilja

marcpabst · June 21, 2022, 9:20am

Is there any reason you would rather chose panelr over using dplyr directly? For Julia, I would look into DataFramesMeta.

IljaK91 · June 21, 2022, 9:33am

At the moment I mainly need a lead/lag operator that respects the panel structure, i.e., applies the lag operator for each id in the panel separately, generates NA where necessary, and puts everything back together. If that is possible with DataFramesMeta directly, that would be great! I guess that is also the added convenience of panelr over deplyr.

nilshg · June 21, 2022, 9:50am

julia> using DataFrames, ShiftedArrays

julia> df = DataFrame(stock = ["a", "a", "a", "b", "b", "b"], time = repeat(1:3, 2), price = rand(100:120, 6))
6×3 DataFrame
 Row │ stock   time   price
     │ String  Int64  Int64
─────┼──────────────────────
   1 │ a           1    108
   2 │ a           2    105
   3 │ a           3    107
   4 │ b           1    101
   5 │ b           2    106
   6 │ b           3    109

julia> transform!(groupby(df, :stock), :price => (x -> log.(x) .- log.(lag(x))) => :return)
6×4 DataFrame
 Row │ stock   time   price  return
     │ String  Int64  Int64  Float64?
─────┼───────────────────────────────────────
   1 │ a           1    108  missing
   2 │ a           2    105       -0.0281709
   3 │ a           3    107        0.0188685
   4 │ b           1    101  missing
   5 │ b           2    106        0.0483186
   6 │ b           3    109        0.0279088

IljaK91 · June 21, 2022, 10:12am

Thanks, that worked

Just one last question: This method does not work when there are gaps in the variable time, correct?

nilshg · June 21, 2022, 3:00pm

Well, it assumes that subsequent rows represent one time step. If you have missing observations which aren’t in the data (i.e. not represented by missing but absent altogether) you can construct the range of time steps first (something like minimum(df.date):Day(1):maximum(df.date)) and then leftjoin your data onto that, which will generate the missing observations. Any return (in my example) where one of the two days used to compute it is missing will then be missing.

eirikeb · September 13, 2022, 6:30pm

At the dangers of over-promoting, but I’ve written a package that takes care of some the issues you mention @IljaK91: PanelDataTools.jl

It is at it’s core just a wrapper around some existing packages and solutions, but from my tests it deals well with for example missing times, use DateTime to keep track of time instead of row numbers (which is what shifted arrays basically does).

Topic		Replies	Views
Lag/lead in panel data Data dataframes	18	3378	March 22, 2022
Create lead and lag variable in DataFrame General Usage question	14	8141	October 22, 2019
[ANN] PanelShift.jl: Time-aware lags and leads in panel data Package Announcements economics , data , dataframes	0	293	January 21, 2023
Taking panel and temporal data seriously Data proposal , jsoc	11	3219	February 23, 2020
Lag vector by group using another vector as the grouping variable General Usage question	4	337	February 20, 2022

Manipulating Panel Data in Julia

Related topics