Create dataframe with n columns of strings

I’m sure there is a simple way to do this, but I cannot figure this out.

Is there a simple way to create a dataFrame of n columns of all String type?

I know I can do this

DataFrame([String,String,String],[:x1, :x2, :x3],0)

but if i have a varying number of columns, I would like to be able to do this dynamically from the size of the input data source.

To answer your question directly, you can do

df = DataFrame([String[] for i in 1:n])

More generally, remember that you can initialize an empty data frame and add columns to it in a loop.

df = DataFrame()
for col in container_of_cols
    df[:, name_of_col] = col

You can also initialize an empty DataFrame and push! named tuples to it. The types will work out as needed.

julia> df = DataFrame()
0×0 DataFrame

julia> push!(df, (a = "hey", b = "there"))

Thanks for your help, I knew it should have been something simple.

Hi! I’m using Julia v1.3 and DataFrames v0.22.5 and I found out that this solution is not working anymore.

Before I was using the following code:

             df = DataFrame()
             for c in col_names_el
                 df[Symbol(c)] = 0.0

But now also this is not working. Any Tips on how to solve this?

Not sure if I should open a new topic so in case let me know and I will do that!

Thanks a lot!

Are you sure this was on DataFrames 0.22? I feel like this syntax has been deprecated longer ago… Anyway: you need to index a DataFrame with both row and column index, so just df[:colname] is not allowed anymore, you need df[!, :colname]. You also want to broadcast the assignment of the scalar 0.0 to the column, so use:

for c in col_names_el
    df[!, c] .= 0.0

Note that it’s also not necessary to do Symbol(c) anymore, as DataFrames can now be indexed with strings (although symbols continue to work as well, so no need to rewrite code that uses symbols!)


The following works

julia> df = DataFrame();

julia> for c in ["a", "b"]
           df[:, c] = [0.0]

Two things.

  1. as mentioned by @nilshg, you need to use either ! or : to index into a column
  2. 0.0 is a scalar. DataFrames columns are vectors. So you need to assign a vector to the column of the data frame.

Thanks @pdeffebach and @nilshg ! Before I was using V0.21 and the code I provided was working. I got little confused by the error and actually forgot the . in front of the =

Thank you again :slight_smile: