I recently updated my DataFrames.jl package and noticed the following deprecation warning when doing df[:a]
:
Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.
I understand that I need to start doing the above as df[:, :a]
or df[!, :a]
but is there a difference between these? If not, what’s the purpose of having the two different syntaxes?
1 Like
First make a copy, second is a view, no?
That sounds like it’s probably right, but I don’t know how to check 
I think this might show that you are right:
using DataFrames
df = DataFrame([collect(1:10), collect(11:20)], [:a, :b])
julia> df[:, :a] === df[:, :a]
false
julia> df[!, :a] === df[!, :a]
true
In the first case, it returns false because separate copies are made while in the second case, it returns true because it’s simply referring to the same column in the same data frame…??
Indeed. You can also use @time
and look for allocations.
julia> @time df[:, :x1];
0.000004 seconds (5 allocations: 256 bytes)
julia> @time df[!, :x1];
0.000003 seconds (4 allocations: 160 bytes)
or use BenchmarkTools:
julia> using BenchmarkTools
julia> @btime df[:, :x1];
57.682 ns (1 allocation: 96 bytes)
julia> @btime df[!, :x1];
28.213 ns (0 allocations: 0 bytes)
1 Like
It is also explained here
1 Like
I didn’t even realize that you can just do df.a
That’s so much nicer than df[!, :a]
1 Like
Note, this works for everything except assigning a single value to a non-existent column. That is:
julia> df = DataFrame(a=rand(10));
julia> df.b = rand(10);
julia> df.c = ["blah" for _ in 1:10];
julia> df.c .= "foo";
julia> df.d .= "bar";
ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
Stacktrace:
[1] getindex(::DataFrame, ::typeof(!), ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/other/index.jl:241
[2] getproperty(::DataFrame, ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/abstractdataframe/abstractdataframe.jl:219
[3] top-level scope at REPL[6]:1
3 Likes