Create variables in a loop

Albert_Zevelev · November 3, 2020, 6:56pm

Suppose I wanna create some routine econ variables:

using FredData;
const f = Fred("Personal api_key");  
#
d = get_data(f, "GDP");     y=d.df[:,4]
d = get_data(f, "PCEC");    c=d.df[:,4]
d = get_data(f, "GPDI");    i=d.df[:,4]
d = get_data(f, "GCE");     g=d.df[:,4]
d = get_data(f, "NETEXP");  nx=d.df[:,4]

Q: what is the best Julian way to automate this in a loop or something?

I had something in mind along the lines of:

vars = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]
labs = [y, c, i, g, nx]
for (ii, dd) in enumerate(vars)
	d = get_data(f, dd)                # get data, eg "GDP"
	labs[ii] = d.df[:,4]               # create var, eg y="GDP"
    filter!(x -> ! isnan(x), labs[ii]) # remove NaN from var
end

This clearly doesn’t work …

Skoffer · November 3, 2020, 7:18pm

It’s better to use dictionaries in this case

dict = Dict{Symbol, Any}()
vars = Dict{String, Symbol}("GDP" => :y, 
                                            "PCEC" => :c,
                                            "GPDI" => :i,
                                            "GCE" => :g, 
                                            "NETEXP" => :nx)
for (k, v) in vars
   d = get_data(f, k)
   dict[v] = d.df[:4]
end

# example of using variable
dict[:nx] # some value

# Or even
dict[vars["NETEXP"]] # the same as dict[:nx]

If you know types in advance it’s better to use them instead of Any.

Albert_Zevelev · November 3, 2020, 7:29pm

@Skoffer thank you, this is great!
Right now the data is in dict[:y].
Is it possible to create the variable y=d.df[:,4] directly inside the loop?

Skoffer · November 3, 2020, 7:37pm

I suppose it can be done with some macros magic, but it is not a recommended way of doing things.

Albert_Zevelev · November 3, 2020, 7:50pm

is that mainly for speed?

Skoffer · November 3, 2020, 8:26pm

Yes, for type stability and performance.

Albert_Zevelev · November 3, 2020, 9:18pm

Does anyone here know how to generate (non-dictionary) variables in a loop? (my original question)

It’s very very easy in STATA where we do it all the time

clear*
set obs 10
gen yr = _n
gen treated = 1*(yr >= 3)
* dummies for each year post treatment
forvalues y = 0(1)7 {
	gen treated_p`y' = 1*(yr == 3 + `y' )
}
* dummies for each year pre treatment
forvalues y = 1(1)2 {
	gen treated_m`y' = 1*(yr == 3 - `y' )
}

pdeffebach · November 3, 2020, 9:30pm

Remember that Stata doesn’t have the same constraints Julia. There is never any dispute about what the variable x represents: it’s always a column in the data set.

Of course, it easy to do this in DataFrames. Part of the switch to using Strings as column names is to make this kind of Stata workflow easier.

for y in string.(0:7)
    df[!, "treated_p" * y] = 1 .* df.yr .== 1 + df[!, y]
end

Henrique_Becker · November 3, 2020, 9:31pm

I think that, in this case, writing a macro may be your best option. Unless you are satisfied by some solution like:

function get_fourth_column(d) # unnecessary but simplifies
    return d.df[:, 4]
end
fields = ["GDP", "PCEC", "GPDI", "GCE", "NETEXP"]

y, c, i, g, nx = get_fourth_column.(get_data.((f,), fields))

pdeffebach · November 3, 2020, 10:34pm

Also, I take it that this is a feature you might be interested in seeing in seeing in DataFramesMeta? I prototyped this just now.

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300])
6×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 1     │ 2     │
│ 3   │ 1     │ 3     │
│ 4   │ 2     │ 100   │
│ 5   │ 2     │ 200   │
│ 6   │ 2     │ 300   │

julia> @with df begin 
       for i in 1:10
           @gen cols("y" * string(i)) = :a .+ i 
       end
       nothing
       end

julia> df
6×12 DataFrame
│ Row │ a     │ b     │ y1    │ y2    │ y3    │ y4    │ y5    │ y6    │ y7    │ y8    │ y9    │ y10   │
│     │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ 1   │ 1     │ 1     │ 2     │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │
│ 2   │ 1     │ 2     │ 2     │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │
│ 3   │ 1     │ 3     │ 2     │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │
│ 4   │ 2     │ 100   │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │ 12    │
│ 5   │ 2     │ 200   │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │ 12    │
│ 6   │ 2     │ 300   │ 3     │ 4     │ 5     │ 6     │ 7     │ 8     │ 9     │ 10    │ 11    │ 12    │

EDIT: Although realistically I would probably rather have a more general solution so that it’s easier to construct many different arguments inside a transform call.

pdeffebach · November 3, 2020, 11:14pm

Here’s something you can do on DataFramesMeta master right now

julia> @time transform(df, [DataFramesMeta.@col cols("y" * "$i") = :a .+ i for i in 1:10]...)

This is pretty good! can probably be tweaked a bit to make it cleaner.

Topic		Replies	Views
Creating variables from file General Usage question	9	963	February 10, 2022
Looping over variables names properly General Usage question	19	2082	July 1, 2022
Question about loop? General Usage	6	369	June 4, 2022
Creation/Definition of global variable for each step of a loop General Usage loops	11	2527	February 24, 2021
Best method for creating many columns in dataframe for dummy variables New to Julia dataframes	1	80	June 16, 2025

Create variables in a loop

Related topics