Hi, guys,
I’ve got a basic question about subsetting of DFs. As shown below, what the difference between dd.y[1:2] and dd[1:2, :y]? My guess is that the former is in-place and the latter is copy of the selection. But they all do in-place modifications as in the code. So I am confused.
Maybe I missed something in the Doc, please direct me to the right place of the doc if I did. Thanks!
dd = DataFrame(x = [1, 2, 3, 4], y = rand(1:10, 4))
dd.y
dd[!, :y]
dd.y === dd[!, :y] #true
dd.y[1:2] === dd[1:2, :y] #false
# they all do in-place modifications, but they are not the same?
dd.y[1:2] = [0, 0]
dd
dd[1:2, :y] = [1, 1]
dd
You may or may not have seen this page in the docs: Indexing · DataFrames.jl
As you observe, dd.y === dd[!, :y]
, and in fact, that is precisely the way that dd.y
is defined.
It is also true that dd.y[1:2] == dd[1:2, :y]
(here with ==
not ===
).
But the observation
dd.y[1:2] === dd[1:2, :y] #false
is based in the Julia language, not DataFrames in particular:
x = [1,2,3,4];
x[1:2] === x[1:2] # false
There is a tiny difference: dd.y
is returning a reference to the Vector
that holds the column :y
from within the dataframe dd
then providing the indexed elements at 1:2
with the getindex
method of the Julia Vector
type in the Base module, while dd[1:2, :y]
is doing all that within an internal DataFrames function.
1 Like
Many thanks! That’s pretty clear. I will go through the indexing part of doc.
1 Like