Suppose I wanna create some routine econ variables:
using FredData;
const f = Fred("Personal api_key");
#
d = get_data(f, "GDP"); y=d.df[:,4]
d = get_data(f, "PCEC"); c=d.df[:,4]
d = get_data(f, "GPDI"); i=d.df[:,4]
d = get_data(f, "GCE"); g=d.df[:,4]
d = get_data(f, "NETEXP"); nx=d.df[:,4]
Q: what is the best Julian way to automate this in a loop or something?
I had something in mind along the lines of:
vars = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]
labs = [y, c, i, g, nx]
for (ii, dd) in enumerate(vars)
d = get_data(f, dd) # get data, eg "GDP"
labs[ii] = d.df[:,4] # create var, eg y="GDP"
filter!(x -> ! isnan(x), labs[ii]) # remove NaN from var
end
This clearly doesnβt work β¦
Itβs better to use dictionaries in this case
dict = Dict{Symbol, Any}()
vars = Dict{String, Symbol}("GDP" => :y,
"PCEC" => :c,
"GPDI" => :i,
"GCE" => :g,
"NETEXP" => :nx)
for (k, v) in vars
d = get_data(f, k)
dict[v] = d.df[:4]
end
# example of using variable
dict[:nx] # some value
# Or even
dict[vars["NETEXP"]] # the same as dict[:nx]
If you know types in advance itβs better to use them instead of Any
.
4 Likes
@Skoffer thank you, this is great!
Right now the data is in dict[:y]
.
Is it possible to create the variable y=d.df[:,4]
directly inside the loop?
I suppose it can be done with some macros magic, but it is not a recommended way of doing things.
1 Like
is that mainly for speed?
Yes, for type stability and performance.
1 Like
Does anyone here know how to generate (non-dictionary) variables in a loop? (my original question)
Itβs very very easy in STATA where we do it all the time
clear*
set obs 10
gen yr = _n
gen treated = 1*(yr >= 3)
* dummies for each year post treatment
forvalues y = 0(1)7 {
gen treated_p`y' = 1*(yr == 3 + `y' )
}
* dummies for each year pre treatment
forvalues y = 1(1)2 {
gen treated_m`y' = 1*(yr == 3 - `y' )
}
Remember that Stata doesnβt have the same constraints Julia. There is never any dispute about what the variable x
represents: itβs always a column in the data set.
Of course, it easy to do this in DataFrames. Part of the switch to using Strings as column names is to make this kind of Stata workflow easier.
for y in string.(0:7)
df[!, "treated_p" * y] = 1 .* df.yr .== 1 + df[!, y]
end
2 Likes
I think that, in this case, writing a macro may be your best option. Unless you are satisfied by some solution like:
function get_fourth_column(d) # unnecessary but simplifies
return d.df[:, 4]
end
fields = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]
y, c, i, g, nx = get_fourth_column.(get_data.((f,), fields))
2 Likes
Also, I take it that this is a feature you might be interested in seeing in seeing in DataFramesMeta? I prototyped this just now.
julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300])
6Γ2 DataFrame
β Row β a β b β
β β Int64 β Int64 β
βββββββΌββββββββΌββββββββ€
β 1 β 1 β 1 β
β 2 β 1 β 2 β
β 3 β 1 β 3 β
β 4 β 2 β 100 β
β 5 β 2 β 200 β
β 6 β 2 β 300 β
julia> @with df begin
for i in 1:10
@gen cols("y" * string(i)) = :a .+ i
end
nothing
end
julia> df
6Γ12 DataFrame
β Row β a β b β y1 β y2 β y3 β y4 β y5 β y6 β y7 β y8 β y9 β y10 β
β β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β
βββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββ€
β 1 β 1 β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β
β 2 β 1 β 2 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β
β 3 β 1 β 3 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β
β 4 β 2 β 100 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β 12 β
β 5 β 2 β 200 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β 12 β
β 6 β 2 β 300 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β 10 β 11 β 12 β
EDIT: Although realistically I would probably rather have a more general solution so that itβs easier to construct many different arguments inside a transform
call.
1 Like
Hereβs something you can do on DataFramesMeta master
right now
julia> @time transform(df, [DataFramesMeta.@col cols("y" * "$i") = :a .+ i for i in 1:10]...)
This is pretty good! can probably be tweaked a bit to make it cleaner.