Various constructors and equality for DataFrame

Tamas_Papp · January 17, 2017, 8:44pm

I came across this problem when writing tests for a package, for a function that produces dataframes using the

DataFrames(columns::AbstractArray{T<:Any,1}, cnames::AbstractArray{Symbol,1})

constructor. Consider this MWE:

using DataFrames
a = collect(1:5)
b = string.(a)
df1 = DataFrame([a,b], [:a,:b])
df2 = DataFrame(a=a,b=b)

First, df1 prints funny:

julia>  df1
5×2 DataFrames.DataFrame
│ Row │ a │ b   │
├─────┼───┼─────┤
│ 1   │ 1 │ 1 │
│ 2   │ 2 │ 2 │
│ 3   │ 3 │ 3 │
│ 4   │ 4 │ 4 │
│ 5   │ 5 │ 5 │

Second,

julia>  df1 == df2
false

I think the issue is that one has NullableArrays, the other plain vanilla Arrays. So should I not be using the constructor above? Or always make the arguments NullableArray? Or use a different function for comparison if I want equality?

nalimilan · January 17, 2017, 9:24pm

If you use the first form of the constructor, you’re responsible for choosing the appropriate column type. This is likely to change at some point though, in which case you’ll get a standard Array in both cases (see this issue).

The printing issue was just fixed.

Tamas_Papp · January 18, 2017, 10:15am

Thank you — I read the issues, but I am still not sure what the “appropriate column type” is until #1119 is resolved. Would using the constructor as

DataFrame(map(NullableArray, [a,b]), [:a,:b])

be the recommended solution for now?

nalimilan · January 18, 2017, 1:24pm

The most appropriate/default column type is DataArray for DataFrames 0.8.x and NullableArray for git master. If you just want to use this type, then use the keyword argument constructor. The other one is only useful when you really want to preserve the original type, which IIUC isn’t your case at all.

We also need to solve the question of whether == and isequal should consider NullableArray and Array equal when they have the same contents. See this issue and this one.

Tamas_Papp · January 18, 2017, 1:44pm

Thanks! So if the column names and values are the result of some other computation, and the names their number is not known in advance, is the recommended constructor something like

DataFrame(; [Pair(key_column...) for key_column in zip(keys,columns)]...)

?

nalimilan · January 18, 2017, 2:20pm

Or even just DataFrame(; zip(keys, columns)...).

Topic		Replies	Views
Issue with DataFrames, operations on DataFrames now return Nullable Arrays? General Usage	5	1905	July 19, 2017
Dataframe parses differently if data is passed in columns vs as an array General Usage dataframes	3	371	April 28, 2021
Is there a package to compare if two DataFrames are the same? New to Julia	11	2254	January 15, 2024
Nullables - why? and how? New to Julia	6	2454	December 19, 2017
Is there a simple way if a DataFrame (say empty as in just allocated) supports DataArray or NullableArray? General Usage	2	395	January 20, 2017

Various constructors and equality for DataFrame

Related topics