I was looking for a straightforward way to do common manipulations in panel data within Julia. For instance, letβs say I have a panel of many countries and years. I would like to calculate the growth rate for a variable country by country. What is the most convenient way to do this? Is there a way to do this using packages such as Query.jl?
At the moment I mainly need a lead/lag operator that respects the panel structure, i.e., applies the lag operator for each id in the panel separately, generates NA where necessary, and puts everything back together. If that is possible with DataFramesMeta directly, that would be great! I guess that is also the added convenience of panelr over deplyr.
Well, it assumes that subsequent rows represent one time step. If you have missing observations which arenβt in the data (i.e. not represented by missing but absent altogether) you can construct the range of time steps first (something like minimum(df.date):Day(1):maximum(df.date)) and then leftjoin your data onto that, which will generate the missing observations. Any return (in my example) where one of the two days used to compute it is missing will then be missing.
At the dangers of over-promoting, but Iβve written a package that takes care of some the issues you mention @IljaK91: PanelDataTools.jl
It is at itβs core just a wrapper around some existing packages and solutions, but from my tests it deals well with for example missing times, use DateTime to keep track of time instead of row numbers (which is what shifted arrays basically does).