Using current variable values in the naming of DataFrame column(s)

Hi there,

The following code chunk is written in R which I would like to replicate in Julia in order to name a variable using the current assigned value for a variable, which is going to change within an iterator:

> mu = 0
> sigma = 1
> assign(paste("normal",mu,sigma,sep="_"),5)
> normal_0_1
[1] 5

Here is some Julia code for an example, with the way to do it in R, given in the comments:

using Random,Distributions,DataFrames

df = DataFrame(Matrix{Float64}(undef,100,5),:auto)
mu = 1.5:0.5:3.5
sigma = 2:0.5:4

for i in 1:5
    df[:,i] = rand(Normal(mu[i],sigma[i]),100)
    ## here is the following line of code that I would like but 
    ## given in R syntax:
    # propertynames(df)[i] = paste("mu",mu[i],"sigma",sigma[i],sep="_")
end

This would give the following output afterwards:

julia> property_names(df}
5-element Vector{Symbol}:
 :mu_1.5_sigma_2.0
 :mu_2.0_sigma_2.5
 :mu_2.5_sigma_3.0
 :mu_3.0_sigma_3.5
 :mu_3.5_sigma_4.0

Any help would be appreciated. Thanks!

If the DataFrame column in question already exists, you can use rename:

julia> var = 1.0
1.0

julia> df = DataFrame(var=var)
1×1 DataFrame
 Row │ var
     │ Float64
─────┼─────────
   1 │     1.0

julia> rename!(df, :var => "var_$var")
1×1 DataFrame
 Row │ var_1.0
     │ Float64
─────┼─────────
   1 │     1.0

For the creation of new columns in an existing DataFrame, you can use:

julia> df = DataFrame()
0×0 DataFrame

julia> df[!, "var_$var"] = [1.0, 2.0]
2-element Vector{Float64}:
 1.0
 2.0

julia> df
2×1 DataFrame
 Row │ var_1.0
     │ Float64
─────┼─────────
   1 │     1.0
   2 │     2.0
3 Likes

To add to the answer above, rename! also accepts integers to specify the columns, so your loop would read like this:

for i in 1:5
    rand!(Normal(mu[i],sigma[i]), df[!,i])
    rename!(df, i => string("mu_", mu[i], "_sigma_", sigma[i]))
end

I also changed the first line in the loop, since your version allocates a new vector with each rand call, whereas rand! writes directly into the vector stored in the DataFrame (here, you need to use df[!,i] instead of df[:,i] to avoid copying the vector from the DataFrame). rand! is part of the Random stdlib.

Edit: fixed wrong order of arguments for rand! (see below)

2 Likes

@sostock, the inplace rand!() command produces:
ERROR: StackOverflowError

Using Julia 1.7, DataFrames v1.2.2
Any idea about what is going on?

DataFramesMeta can handle this, too (unlike dplyr).

for i in 1:5
    @transform df $"va$i" = rand(Normal(mu, sigma), 100)
end

Oh, I didn’t see that the rand! implementation in Distributions wants the distribution first and then the array. The arguments have to be switched:

rand!(Normal(mu[i],sigma[i]), df[!,i])

Edit: There is an issue for switching the argument order to be consistent with Random, so this might change in a future Distributions.jl release.

1 Like