To sum up the discussion. The design of DataFrames.jl is about consistency. In order to understand the design of DataFrames.jl you first need to understand how functions in Julia Base work.
julia> x = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> last(x)
3
julia> last(x, 1)
1-element Vector{Int64}:
3
So as you can see writing last(x)
drops a dimension and last(x, 1)
does not drop the dimension.
The same is with DataFrames.jl. If you write last(df)
a dimension is dropped (from 2-dimensional DataFrame
to 1-dimensional DataFrameRow
). If you write last(df, 1)
then dimension is not dropped and you get a 2-dimensional DataFrame
with one row.
Now regarding push!
, append!
and vcat
.
See what happens in Julia Base:
julia> x = [(a="1",), (a="2",), (a="3",)]
3-element Vector{NamedTuple{(:a,), Tuple{String}}}:
(a = "1",)
(a = "2",)
(a = "3",)
julia> push!(x, last(x))
4-element Vector{NamedTuple{(:a,), Tuple{String}}}:
(a = "1",)
(a = "2",)
(a = "3",)
(a = "3",)
julia> append!(x, last(x))
ERROR: MethodError: Cannot `convert` an object of type String to an object of type NamedTuple{(:a,), Tuple{String}}
So you can push!
but cannot in general append!
the value of last(x)
to x
.
Now the reverse:
julia> x = [(a="1",), (a="2",), (a="3",)]
3-element Vector{NamedTuple{(:a,), Tuple{String}}}:
(a = "1",)
(a = "2",)
(a = "3",)
julia> append!(x, last(x, 1))
4-element Vector{NamedTuple{(:a,), Tuple{String}}}:
(a = "1",)
(a = "2",)
(a = "3",)
(a = "3",)
julia> push!(x, last(x, 1))
ERROR: MethodError: Cannot `convert` an object of type Vector{NamedTuple{(:a,), Tuple{String}}} to an object of type NamedTuple{(:a,), Tuple{String}}
so you can append!
the value of last(x, 1)
but in general cannot push!
it.
As for vcat
consider the following:
julia> a = [1 2; 3 4]
2Γ2 Matrix{Int64}:
1 2
3 4
julia> b = [1, 2]
2-element Vector{Int64}:
1
2
julia> vcat(a, b)
ERROR: ArgumentError: number of columns of each array must match (got (2, 1))
so you are not allowed to vcat
a 1-dimensional and 2-dimensional object.
What is allowed in Julia Base is:
julia> a = [1, 2][:, 1:1]
2Γ1 Matrix{Int64}:
1
2
julia> b = [3, 4]
2-element Vector{Int64}:
3
4
julia> vcat(a, b)
4Γ1 Matrix{Int64}:
1
2
3
4
but I would say that no-one would want to allow vcat
of 1-column data frame with a multi-column DataFrameRow
like this, so this is not allowed.
In summary Julia Base and DataFrames.jl work in exactly the same way (except for the last case where the behavior of Julia Base is clearly not desirable). Additionally this design is made to be logically consistent with the notion of dimensionality of different objects.
Indeed I know that R and Python are much more flexible in allowing combination of objects of different dimensions, but I personally do not like it as most of the time it leads to hard-to-catch logical bugs in userβs code. On the other hand Julia provides you all the tools you might need to explicitly control the dimension of objects you produce, e.g. last(df)
drops a dimension and produces a DataFrameRow
and last(df, 1)
does not drop a dimension and produces a DataFrame
.