I recently updated my DataFrames.jl package and noticed the following deprecation warning when doing
Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.
I understand that I need to start doing the above as
df[:, :a] or
df[!, :a] but is there a difference between these? If not, what’s the purpose of having the two different syntaxes?
First make a copy, second is a view, no?
That sounds like it’s probably right, but I don’t know how to check
I think this might show that you are right:
df = DataFrame([collect(1:10), collect(11:20)], [:a, :b])
julia> df[:, :a] === df[:, :a]
julia> df[!, :a] === df[!, :a]
In the first case, it returns false because separate copies are made while in the second case, it returns true because it’s simply referring to the same column in the same data frame…??
Indeed. You can also use
@time and look for allocations.
julia> @time df[:, :x1];
0.000004 seconds (5 allocations: 256 bytes)
julia> @time df[!, :x1];
0.000003 seconds (4 allocations: 160 bytes)
or use BenchmarkTools:
julia> using BenchmarkTools
julia> @btime df[:, :x1];
57.682 ns (1 allocation: 96 bytes)
julia> @btime df[!, :x1];
28.213 ns (0 allocations: 0 bytes)
It is also explained here
I didn’t even realize that you can just do
df.a That’s so much nicer than
Note, this works for everything except assigning a single value to a non-existent column. That is:
julia> df = DataFrame(a=rand(10));
julia> df.b = rand(10);
julia> df.c = ["blah" for _ in 1:10];
julia> df.c .= "foo";
julia> df.d .= "bar";
ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
 getindex(::DataFrame, ::typeof(!), ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/other/index.jl:241
 getproperty(::DataFrame, ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/abstractdataframe/abstractdataframe.jl:219
 top-level scope at REPL:1