I came across this problem when writing tests for a package, for a function that produces dataframes using the
DataFrames(columns::AbstractArray{T<:Any,1}, cnames::AbstractArray{Symbol,1})
constructor. Consider this MWE:
using DataFrames
a = collect(1:5)
b = string.(a)
df1 = DataFrame([a,b], [:a,:b])
df2 = DataFrame(a=a,b=b)
First, df1
prints funny:
julia> df1
5×2 DataFrames.DataFrame
│ Row │ a │ b │
├─────┼───┼─────┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
│ 4 │ 4 │ 4 │
│ 5 │ 5 │ 5 │
Second,
julia> df1 == df2
false
I think the issue is that one has NullableArray
s, the other plain vanilla Array
s. So should I not be using the constructor above? Or always make the arguments NullableArray
? Or use a different function for comparison if I want equality?
If you use the first form of the constructor, you’re responsible for choosing the appropriate column type. This is likely to change at some point though, in which case you’ll get a standard Array
in both cases (see this issue).
The printing issue was just fixed.
Thank you — I read the issues, but I am still not sure what the “appropriate column type” is until #1119 is resolved. Would using the constructor as
DataFrame(map(NullableArray, [a,b]), [:a,:b])
be the recommended solution for now?
The most appropriate/default column type is DataArray
for DataFrames 0.8.x and NullableArray
for git master. If you just want to use this type, then use the keyword argument constructor. The other one is only useful when you really want to preserve the original type, which IIUC isn’t your case at all.
We also need to solve the question of whether ==
and isequal
should consider NullableArray
and Array
equal when they have the same contents. See this issue and this one.
1 Like
Thanks! So if the column names and values are the result of some other computation, and the names their number is not known in advance, is the recommended constructor something like
DataFrame(; [Pair(key_column...) for key_column in zip(keys,columns)]...)
?
Or even just DataFrame(; zip(keys, columns)...)
.