Create variables in a loop

Suppose I wanna create some routine econ variables:

using FredData;
const f = Fred("Personal api_key");  
#
d = get_data(f, "GDP");     y=d.df[:,4]
d = get_data(f, "PCEC");    c=d.df[:,4]
d = get_data(f, "GPDI");    i=d.df[:,4]
d = get_data(f, "GCE");     g=d.df[:,4]
d = get_data(f, "NETEXP");  nx=d.df[:,4]

Q: what is the best Julian way to automate this in a loop or something?

I had something in mind along the lines of:

vars = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]
labs = [y, c, i, g, nx]
for (ii, dd) in enumerate(vars)
	d = get_data(f, dd)                # get data, eg "GDP"
	labs[ii] = d.df[:,4]               # create var, eg y="GDP"
    filter!(x -> ! isnan(x), labs[ii]) # remove NaN from var
end 

This clearly doesn’t work …

It’s better to use dictionaries in this case

dict = Dict{Symbol, Any}()
vars = Dict{String, Symbol}("GDP" => :y, 
                                            "PCEC" => :c,
                                            "GPDI" => :i,
                                            "GCE" => :g, 
                                            "NETEXP" => :nx)
for (k, v) in vars
   d = get_data(f, k)
   dict[v] = d.df[:4]
end

# example of using variable
dict[:nx] # some value

# Or even
dict[vars["NETEXP"]] # the same as dict[:nx]

If you know types in advance it’s better to use them instead of Any.

4 Likes

@Skoffer thank you, this is great!
Right now the data is in dict[:y].
Is it possible to create the variable y=d.df[:,4] directly inside the loop?

I suppose it can be done with some macros magic, but it is not a recommended way of doing things.

1 Like

is that mainly for speed?

Yes, for type stability and performance.

1 Like

Does anyone here know how to generate (non-dictionary) variables in a loop? (my original question)

It’s very very easy in STATA where we do it all the time

clear*
set obs 10
gen yr = _n
gen treated = 1*(yr >= 3)
* dummies for each year post treatment
forvalues y = 0(1)7 {
	gen treated_p`y' = 1*(yr == 3 + `y' )
}
* dummies for each year pre treatment
forvalues y = 1(1)2 {
	gen treated_m`y' = 1*(yr == 3 - `y' )
}

Remember that Stata doesn’t have the same constraints Julia. There is never any dispute about what the variable x represents: it’s always a column in the data set.

Of course, it easy to do this in DataFrames. Part of the switch to using Strings as column names is to make this kind of Stata workflow easier.

for y in string.(0:7)
    df[!, "treated_p" * y] = 1 .* df.yr .== 1 + df[!, y]
end
2 Likes

I think that, in this case, writing a macro may be your best option. Unless you are satisfied by some solution like:

function get_fourth_column(d) # unnecessary but simplifies
    return d.df[:, 4]
end
fields = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]

y, c, i, g, nx = get_fourth_column.(get_data.((f,), fields))
2 Likes

Also, I take it that this is a feature you might be interested in seeing in seeing in DataFramesMeta? I prototyped this just now.

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300])
6Γ—2 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 1     β”‚
β”‚ 2   β”‚ 1     β”‚ 2     β”‚
β”‚ 3   β”‚ 1     β”‚ 3     β”‚
β”‚ 4   β”‚ 2     β”‚ 100   β”‚
β”‚ 5   β”‚ 2     β”‚ 200   β”‚
β”‚ 6   β”‚ 2     β”‚ 300   β”‚

julia> @with df begin 
       for i in 1:10
           @gen cols("y" * string(i)) = :a .+ i 
       end
       nothing
       end

julia> df
6Γ—12 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚ y1    β”‚ y2    β”‚ y3    β”‚ y4    β”‚ y5    β”‚ y6    β”‚ y7    β”‚ y8    β”‚ y9    β”‚ y10   β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 1     β”‚ 2     β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚
β”‚ 2   β”‚ 1     β”‚ 2     β”‚ 2     β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚
β”‚ 3   β”‚ 1     β”‚ 3     β”‚ 2     β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚
β”‚ 4   β”‚ 2     β”‚ 100   β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚ 12    β”‚
β”‚ 5   β”‚ 2     β”‚ 200   β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚ 12    β”‚
β”‚ 6   β”‚ 2     β”‚ 300   β”‚ 3     β”‚ 4     β”‚ 5     β”‚ 6     β”‚ 7     β”‚ 8     β”‚ 9     β”‚ 10    β”‚ 11    β”‚ 12    β”‚

EDIT: Although realistically I would probably rather have a more general solution so that it’s easier to construct many different arguments inside a transform call.

1 Like

Here’s something you can do on DataFramesMeta master right now

julia> @time transform(df, [DataFramesMeta.@col cols("y" * "$i") = :a .+ i for i in 1:10]...)

This is pretty good! can probably be tweaked a bit to make it cleaner.