Juan
November 4, 2020, 1:18am
1
If I want to create a DataFrame with just 2 columns I can do:
DataFrame(a=rand(Normal(0, 1), 10), b=rand(Normal(0, 1), 10))
But what if I want to create a DataFrame with hundreds or thousands of columns with a given name?
For example with the following column names, just 48 for this example, (or whatever other container you prefer).
vec(map(IterTools.product(‘a’:‘d’, ‘a’:‘d’, string.(2001:2003))) do (x, y, z) xy ’_'*z end)
And for this example we will also fill each column with
rand(Normal(0, 1), 10)
I will use it later to create an example for benchmarking purposes.
You can use .=>
to broadcast the Pair
operator:
julia> DataFrame(col_names .=> [randn(10) for _ in eachindex(col_names)])
10×48 DataFrame. Omitted printing of 42 columns
│ Row │ aa2001 │ ba2001 │ ca2001 │ da2001 │ ab2001 │ bb2001 │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼───────────┼───────────┼───────────┼───────────┼────────────┼───────────┤
│ 1 │ 0.187453 │ 0.922989 │ -1.8337 │ 1.10598 │ 0.2527 │ -1.26025 │
│ 2 │ -0.698976 │ 0.0275297 │ -0.779797 │ 0.325134 │ 0.184392 │ -0.541654 │
│ 3 │ -0.533002 │ 0.617138 │ -0.721395 │ -1.4459 │ -0.109285 │ 0.943458 │
│ 4 │ 2.59956 │ -0.24512 │ -0.556589 │ -1.33378 │ 2.12868 │ 0.856728 │
│ 5 │ 1.99969 │ 1.69739 │ -0.374264 │ 0.269507 │ -0.604224 │ 0.612185 │
│ 6 │ -0.136302 │ 0.922046 │ 1.21671 │ 1.17714 │ 0.90012 │ -0.58445 │
│ 7 │ 0.431472 │ 1.08326 │ -1.8062 │ -1.42047 │ 0.990874 │ -2.76279 │
│ 8 │ -0.602431 │ -0.300705 │ -0.184261 │ 0.613706 │ 0.232971 │ -0.548315 │
│ 9 │ -0.437662 │ -0.808732 │ -0.714415 │ 1.16602 │ 0.00941199 │ -0.352265 │
│ 10 │ 1.10546 │ 1.38544 │ -0.329173 │ -0.765127 │ 0.605886 │ 1.60454 │
4 Likes
nilshg
November 4, 2020, 10:39am
3
There is a DataFrames constructor for this:
julia> using DataFrames, Random
julia> names = [randstring(5) for _ ∈ 1:10]
10-element Array{String,1}:
"dzRQt"
"kKXW7"
"0JSL6"
"Ns7VN"
"uZyLz"
"70f0T"
"A3PrD"
"Od9Lz"
"Guazy"
"pTw48"
julia> data = randn(10, 10)
10×10 Array{Float64,2}:
0.309686 -1.1974 -0.0187716 -0.303907 1.32897 -0.277437 1.33409 1.88879 0.603044 -1.4253
-0.338923 -1.03677 1.01156 -1.74512 -0.87579 -0.060289 0.643243 -1.37126 0.400429 0.689121
-0.140837 0.193948 -0.411703 -0.260852 0.789106 0.842438 0.679892 0.834983 -1.18727 -0.178523
0.0755439 1.50667 -0.0136337 -0.462559 -0.191108 -1.10486 2.57489 -0.682026 1.65719 1.08617
0.403895 2.62865 0.257171 0.39861 1.11401 1.30457 0.767682 0.60543 0.449838 0.354192
0.704756 1.01318 -1.47469 0.0364399 0.906231 -1.05733 0.169764 -0.142383 -1.41441 -0.861899
0.833152 1.14731 1.2926 -0.913615 0.957537 1.25694 0.01692 -1.75855 -0.665406 -1.43099
0.106316 0.833295 -0.269914 -0.867696 0.763117 0.651651 0.317162 -0.882739 0.139936 0.174196
0.53614 0.346916 -0.541661 -1.94401 0.542825 0.882737 0.240241 -1.3405 -1.46032 -0.883309
-0.315214 -1.39484 -1.02137 1.91367 0.965089 1.52959 -1.46762 0.435068 1.80926 -0.502492
julia> DataFrame(data, names)
10×10 DataFrame
│ Row │ dzRQt │ kKXW7 │ 0JSL6 │ Ns7VN │ uZyLz │ 70f0T │ A3PrD │ Od9Lz │ Guazy │ pTw48 │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼───────────┼──────────┼────────────┼───────────┼───────────┼───────────┼──────────┼───────────┼───────────┼───────────┤
│ 1 │ 0.309686 │ -1.1974 │ -0.0187716 │ -0.303907 │ 1.32897 │ -0.277437 │ 1.33409 │ 1.88879 │ 0.603044 │ -1.4253 │
│ 2 │ -0.338923 │ -1.03677 │ 1.01156 │ -1.74512 │ -0.87579 │ -0.060289 │ 0.643243 │ -1.37126 │ 0.400429 │ 0.689121 │
│ 3 │ -0.140837 │ 0.193948 │ -0.411703 │ -0.260852 │ 0.789106 │ 0.842438 │ 0.679892 │ 0.834983 │ -1.18727 │ -0.178523 │
│ 4 │ 0.0755439 │ 1.50667 │ -0.0136337 │ -0.462559 │ -0.191108 │ -1.10486 │ 2.57489 │ -0.682026 │ 1.65719 │ 1.08617 │
│ 5 │ 0.403895 │ 2.62865 │ 0.257171 │ 0.39861 │ 1.11401 │ 1.30457 │ 0.767682 │ 0.60543 │ 0.449838 │ 0.354192 │
│ 6 │ 0.704756 │ 1.01318 │ -1.47469 │ 0.0364399 │ 0.906231 │ -1.05733 │ 0.169764 │ -0.142383 │ -1.41441 │ -0.861899 │
│ 7 │ 0.833152 │ 1.14731 │ 1.2926 │ -0.913615 │ 0.957537 │ 1.25694 │ 0.01692 │ -1.75855 │ -0.665406 │ -1.43099 │
│ 8 │ 0.106316 │ 0.833295 │ -0.269914 │ -0.867696 │ 0.763117 │ 0.651651 │ 0.317162 │ -0.882739 │ 0.139936 │ 0.174196 │
│ 9 │ 0.53614 │ 0.346916 │ -0.541661 │ -1.94401 │ 0.542825 │ 0.882737 │ 0.240241 │ -1.3405 │ -1.46032 │ -0.883309 │
│ 10 │ -0.315214 │ -1.39484 │ -1.02137 │ 1.91367 │ 0.965089 │ 1.52959 │ -1.46762 │ 0.435068 │ 1.80926 │ -0.502492 │
4 Likes
Juan
November 4, 2020, 11:19am
4
What if I want to initialize each column with something different?
What format should the “data” in your example have? A a dictionary of vectors, a tuple, a vector or what?
in data
above, it is a matrix. and names is a vector of String
s or a vector of Symbol
s
You can also just add the names in a loop. This sill be fast
names = map(Iterators.product('a':'d', 'a':'d', string.(2001:2003))) do (x, y, z)
x*'_' * y * '_' *z
end |> vec
df = DataFrame()
for n in names
df[!, n] = rand(Normal(0, 1), 10)
end
1 Like