Create dataframe with n columns of strings

I’m sure there is a simple way to do this, but I cannot figure this out.

Is there a simple way to create a dataFrame of n columns of all String type?

I know I can do this

DataFrame([String,String,String],[:x1, :x2, :x3],0)

but if i have a varying number of columns, I would like to be able to do this dynamically from the size of the input data source.

To answer your question directly, you can do

df = DataFrame([String[] for i in 1:n])

More generally, remember that you can initialize an empty data frame and add columns to it in a loop.

df = DataFrame()
for col in container_of_cols
    df[:, name_of_col] = col
end

You can also initialize an empty DataFrame and push! named tuples to it. The types will work out as needed.

julia> df = DataFrame()
0×0 DataFrame

julia> push!(df, (a = "hey", b = "there"))
2 Likes

Thanks for your help, I knew it should have been something simple.

Hi! I’m using Julia v1.3 and DataFrames v0.22.5 and I found out that this solution is not working anymore.

Before I was using the following code:

             df = DataFrame()
             for c in col_names_el
                 df[Symbol(c)] = 0.0
             end

But now also this is not working. Any Tips on how to solve this?

P.s.
Not sure if I should open a new topic so in case let me know and I will do that!

Thanks a lot!

Are you sure this was on DataFrames 0.22? I feel like this syntax has been deprecated longer ago… Anyway: you need to index a DataFrame with both row and column index, so just df[:colname] is not allowed anymore, you need df[!, :colname]. You also want to broadcast the assignment of the scalar 0.0 to the column, so use:

for c in col_names_el
    df[!, c] .= 0.0
end

Note that it’s also not necessary to do Symbol(c) anymore, as DataFrames can now be indexed with strings (although symbols continue to work as well, so no need to rewrite code that uses symbols!)

3 Likes

The following works

julia> df = DataFrame();

julia> for c in ["a", "b"]
           df[:, c] = [0.0]
       end

Two things.

  1. as mentioned by @nilshg, you need to use either ! or : to index into a column
  2. 0.0 is a scalar. DataFrames columns are vectors. So you need to assign a vector to the column of the data frame.
3 Likes

Thanks @pdeffebach and @nilshg ! Before I was using V0.21 and the code I provided was working. I got little confused by the error and actually forgot the . in front of the =

Thank you again :slight_smile: