Creating an empty dataframe from a vector of strings fails?

Hi all
got a string vector

col_names = ["start","interval","goal","num_hedgeHogs"]

I want to create an empty dataframe using col_names as the column names and a data type of Any.

I thought this might do the trick

 df_test = DataFrame( Symbol.(col_names))

but it get’s me

ERROR: ArgumentError: 'Vector{Symbol}' iterates 'Symbol' values, which doesn't satisfy the Tables.jl `AbstractRow` interface
 [1] invalidtable(#unused#::Vector{Symbol}, #unused#::Symbol)
   @ Tables ~/.julia/packages/Tables/PxO1m/src/tofromdatavalues.jl:42
 [2] iterate
   @ ~/.julia/packages/Tables/PxO1m/src/tofromdatavalues.jl:48 [inlined]
 [3] buildcolumns
   @ ~/.julia/packages/Tables/PxO1m/src/fallbacks.jl:197 [inlined]
 [4] columns
   @ ~/.julia/packages/Tables/PxO1m/src/fallbacks.jl:260 [inlined]
 [5] DataFrame(x::Vector{Symbol}; copycols::Nothing)
   @ DataFrames ~/.julia/packages/DataFrames/MA4YO/src/other/tables.jl:58
 [6] DataFrame(x::Vector{Symbol})
   @ DataFrames ~/.julia/packages/DataFrames/MA4YO/src/other/tables.jl:49
 [7] top-level scope
   @ REPL[43]:1


df_test = DataFrame(Any[],Symbol.(col_names))


julia> df_test = DataFrame(Any[],Symbol.(col_names))
ERROR: DimensionMismatch("Number of columns (0) and number of column names (9) are not equal")
 [1] DataFrame(columns::Vector{AbstractVector}, colindex::DataFrames.Index; copycols::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/MA4YO/src/dataframe/dataframe.jl:178
 [2] DataFrame(columns::Vector{Any}, cnames::Vector{Symbol}; makeunique::Bool, copycols::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/MA4YO/src/dataframe/dataframe.jl:322
 [3] DataFrame(columns::Vector{Any}, cnames::Vector{Symbol})
   @ DataFrames ~/.julia/packages/DataFrames/MA4YO/src/dataframe/dataframe.jl:319
 [4] top-level scope
   @ REPL[5]:1


any help appreciated .

You would need

DataFrame([Any[] for i in 1:length(col_names)], col_names)

This has the unfortunate consequence of making your columns of type Any, which is not ideal.

1 Like

first of all thank you for the speedy reply. I want to set myself a decent foundation in julia so if there is another approach which is more rational I am more than happy to listen.

secondly. I understand that defining all the columns as Any is a bad move but how do I stipulate the type of the columns? So in my case ( for example)

"start" is a timestamp
"interval" is a int63
"goal" is a string 
"num_hedgeHogs" is an integer.

You can declare empty arrays of a type like String[], or Int[] or DateTime[].

but in my case I am creating the dataframe from a string vector for the column names, how would I define the column types using

DataFrame([Any[] for i in 1:length(col_names)], col_names)

Is there a better approach?

First, you can take advantage of the fact that DataFrame constructor by default copies columns so you can write:

DataFrame(col_names .=> Ref([]))

Now if you want different element types for columns you can write e.g.:

DataFrame(col_names .=> [T[] for T in [Int, String, Bool, Char]])

or (via a small hack):

DataFrame(col_names .=> rand.([Int, String, Bool, Char], 0))

thanks for the reply. As always lashings to build on. I didn’t know anything about Ref() so that’s something to look into.

DataFrame(col_names .=> rand.([Int, String, Bool, Char], 0))

Is this a trap :slight_smile: to my noob eye this seems to randomize the allocation of a type to a column. Admittedly this would add a bit of fun to the process but…

No, it is not a trap and it is fully non-random (that is why I say it is a hack). We create 0-length vectors of types specified by the first argument:

julia> rand.([Int, String, Bool, Char], 0)
4-element Vector{Vector}:

The trick is that since the length of all vectors is 0 it works even for data types that are not supported by random number generator.


just checking after the “get us a bubble for the spirit level” incident :slight_smile:

thanks again for the considered reply and the feast of food for thought.