Difference between subsetting methods

Hi, guys,

I’ve got a basic question about subsetting of DFs. As shown below, what the difference between dd.y[1:2] and dd[1:2, :y]? My guess is that the former is in-place and the latter is copy of the selection. But they all do in-place modifications as in the code. So I am confused.

Maybe I missed something in the Doc, please direct me to the right place of the doc if I did. Thanks!

dd = DataFrame(x = [1, 2, 3, 4], y = rand(1:10, 4))
dd.y
dd[!, :y]
dd.y === dd[!, :y] #true
dd.y[1:2] === dd[1:2, :y] #false

# they all do in-place modifications, but they are not the same?
dd.y[1:2] = [0, 0]
dd
dd[1:2, :y] = [1, 1]
dd

You may or may not have seen this page in the docs: Indexing · DataFrames.jl

As you observe, dd.y === dd[!, :y], and in fact, that is precisely the way that dd.y is defined.
It is also true that dd.y[1:2] == dd[1:2, :y] (here with == not ===).
But the observation

dd.y[1:2] === dd[1:2, :y] #false

is based in the Julia language, not DataFrames in particular:

 x = [1,2,3,4];
 x[1:2] === x[1:2]  # false

There is a tiny difference: dd.y is returning a reference to the Vector that holds the column :y from within the dataframe dd then providing the indexed elements at 1:2 with the getindex method of the Julia Vector type in the Base module, while dd[1:2, :y] is doing all that within an internal DataFrames function.

1 Like

Many thanks! That’s pretty clear. I will go through the indexing part of doc.

1 Like