Difference between df[:, :a] and df[!, :a]

I recently updated my DataFrames.jl package and noticed the following deprecation warning when doing df[:a]:

Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.

I understand that I need to start doing the above as df[:, :a] or df[!, :a] but is there a difference between these? If not, what’s the purpose of having the two different syntaxes?

1 Like

First make a copy, second is a view, no?

That sounds like it’s probably right, but I don’t know how to check :slightly_smiling_face:

I think this might show that you are right:

using DataFrames

df = DataFrame([collect(1:10), collect(11:20)], [:a, :b])

julia> df[:, :a] === df[:, :a]

julia> df[!, :a] === df[!, :a]

In the first case, it returns false because separate copies are made while in the second case, it returns true because it’s simply referring to the same column in the same data frame…??

Indeed. You can also use @time and look for allocations.

julia> @time df[:, :x1];
  0.000004 seconds (5 allocations: 256 bytes)

julia> @time df[!, :x1];
  0.000003 seconds (4 allocations: 160 bytes)

or use BenchmarkTools:

julia> using BenchmarkTools

julia> @btime df[:, :x1];
  57.682 ns (1 allocation: 96 bytes)

julia> @btime df[!, :x1];
  28.213 ns (0 allocations: 0 bytes)
1 Like

It is also explained here

1 Like

I didn’t even realize that you can just do df.a :laughing: That’s so much nicer than df[!, :a]

1 Like

Note, this works for everything except assigning a single value to a non-existent column. That is:

julia> df = DataFrame(a=rand(10));

julia> df.b = rand(10);

julia> df.c = ["blah" for _ in 1:10];

julia> df.c .= "foo";

julia> df.d .= "bar";
ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
 [1] getindex(::DataFrame, ::typeof(!), ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/other/index.jl:241
 [2] getproperty(::DataFrame, ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/abstractdataframe/abstractdataframe.jl:219
 [3] top-level scope at REPL[6]:1