Creating columns in DataFrame via loops

To walk through your code…

Let’s check that against your first iteration (that’s not enough debugging generally but it’s enough here). The first i in category is "cat1", and the first x in cat_values is 0.203842327, so the first iteration does:

julia> df[:, "cat1"] .= cat_values[0.203842327]
ERROR: ArgumentError: invalid index: 0.203842327 of type Float64

The same error, and it’s apparent we didn’t mean to index cat_values a 2nd time.

julia> df[:, "cat1"] .= 0.203842327
10-element Vector{Float64}:
 0.203842327
 0.203842327
 0.203842327
...

So let’s make that change in your loop and check the resulting df:

julia> for i in category, x in cat_values
         df[:,i] .= x
       end

julia> df
10×6 DataFrame
 Row │ A      cat1       cat2       cat3       cat4       cat5
     │ Int64  Float64    Float64    Float64    Float64    Float64
─────┼──────────────────────────────────────────────────────────────
   1 │     1  0.0349219  0.0349219  0.0349219  0.0349219  0.0349219
   2 │     2  0.0349219  0.0349219  0.0349219  0.0349219  0.0349219
   3 │     3  0.0349219  0.0349219  0.0349219  0.0349219  0.0349219
...

Well that’s not what we want either. We’re iterating through category columns and cat_values values, so what’s the problem? Let’s reference the loop docs:

Multiple nested for loops can be combined into a single outer loop, forming the cartesian product of its iterables:

julia> for i = 1:2, j = 3:4
           println((i, j))
       end
(1, 3)
(1, 4)
(2, 3)
(2, 4)

So instead of 5 iterations of category and cat_values in parallel, we had 5x5=25 iterations of the Cartesian product of their elements. The order made it so that we filled each column with successive values of cat_values until the last 0.0349219. To iterate 2 or more sequences in parallel (until the shortest one is exhausted), we can use zip:

julia> for (i,x) in zip(category, cat_values)
           df[:,i] .= x
       end

julia> df
10×6 DataFrame
 Row │ A      cat1      cat2      cat3     cat4       cat5
     │ Int64  Float64   Float64   Float64  Float64    Float64
─────┼──────────────────────────────────────────────────────────
   1 │     1  0.203842  0.210149      0.0  0.0702434  0.0349219
   2 │     2  0.203842  0.210149      0.0  0.0702434  0.0349219
   3 │     3  0.203842  0.210149      0.0  0.0702434  0.0349219
...

Seems right.

4 Likes