Difference between subsetting methods

aachener · January 3, 2022, 1:29am

Hi, guys,

I’ve got a basic question about subsetting of DFs. As shown below, what the difference between dd.y[1:2] and dd[1:2, :y]? My guess is that the former is in-place and the latter is copy of the selection. But they all do in-place modifications as in the code. So I am confused.

Maybe I missed something in the Doc, please direct me to the right place of the doc if I did. Thanks!

dd = DataFrame(x = [1, 2, 3, 4], y = rand(1:10, 4))
dd.y
dd[!, :y]
dd.y === dd[!, :y] #true
dd.y[1:2] === dd[1:2, :y] #false

# they all do in-place modifications, but they are not the same?
dd.y[1:2] = [0, 0]
dd
dd[1:2, :y] = [1, 1]
dd

jd-foster · January 3, 2022, 8:59am

You may or may not have seen this page in the docs: Indexing · DataFrames.jl

As you observe, dd.y === dd[!, :y], and in fact, that is precisely the way that dd.y is defined.
It is also true that dd.y[1:2] == dd[1:2, :y] (here with == not ===).
But the observation

dd.y[1:2] === dd[1:2, :y] #false

is based in the Julia language, not DataFrames in particular:

 x = [1,2,3,4];
 x[1:2] === x[1:2]  # false

There is a tiny difference: dd.y is returning a reference to the Vector that holds the column :y from within the dataframe dd then providing the indexed elements at 1:2 with the getindex method of the Julia Vector type in the Base module, while dd[1:2, :y] is doing all that within an internal DataFrames function.

aachener · January 3, 2022, 9:09am

Many thanks! That’s pretty clear. I will go through the indexing part of doc.

Topic		Replies	Views
Difference between df[:, :a] and df[!, :a] General Usage dataframes	7	3689	January 18, 2023
Difference between df.column and df[!, :column] New to Julia question , dataframes	9	2794	July 23, 2020
Subset differences between \|\| and \| operators New to Julia question	2	126	August 23, 2024
Different of df[!, :name] and df[:, :name] New to Julia dataframes	3	167	May 12, 2023
Indexing DataFrame with : does not generate a copy Specific Domains dataframes	2	765	March 17, 2022

Difference between subsetting methods

Related topics