DataFrame with "any" vector

I want to dynamically create a DataFrame. This means that I don’t know the column names and values beforehand.

Any[1, "\"I am foo\"", 2, "\"You are bar\"", 3, "\"We are foobar\""]

And to create the DataFrame, I do:

df = DataFrame(new_data, column_names)

Where column_names is:

["id", "foo"].

However, I’m getting the error:

ArgumentError: columns argument must be a vector of AbstractVector objects

How can I do this better?

Just a first question to understanding, is your first line of code, i.e.

A = Any[1, "\"I am foo\"", 2, "\"You are bar\"", 3, "\"We are foobar\""]

meant to. be your “foo” column? Then I think you are missing an Id column.

If you take

B = [1,2,3,4,5,6]

then

julia> df = DataFrame([A,B], ["id", "foo"])
6×2 DataFrame
 Row │ id               foo
     │ Any              Any
─────┼──────────────────────
   1 │ 1                1
   2 │ "I am foo"       2
   3 │ 2                3
   4 │ "You are bar"    4
   5 │ 3                5
   6 │ "We are foobar"  6

Ah, or maybe you have 3 data sets in there? Then they need to be separate vectors, i.e.

df2 = DataFrame([ [1,2,3], ["Foo", "Bar", "Baz"] ], ["id","foo"])

also works perfectly fine – the problem might be that you have your data as one vector instead of 2 here.

My first line is the complete dataset, the integers are the ID’s, the strings is the ‘foo’ column. If the complete dataset are all strings, it does work.

From the error message (that says you need a vector of vectors) you have to have your data in the form I wrote in the second example then, I think.

Or if your dataset from above is stored in A, then you could also do

DataFrame([A[1:2:end], A[2:2:end]], ["Id", "Foo"])

as long as you know how many columns you have – if you do not know that, it might be complicated/impossible, since how should one obtain that information from the vector.

For a slightly more general case, you could use permutedims(reshape()):

using DataFrames
vdata = Any[1, "\"I am foo\"", 2, "\"You are bar\"", 3, "\"We are foobar\""]
colnames = ["id", "foo"]
mdata = permutedims(reshape(vdata, length(colnames), :))
df = DataFrame(mdata, colnames)
3 Likes

Thanks, that works!