I recently updated my DataFrames.jl package and noticed the following deprecation warning when doing df[:a]:
Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.
I understand that I need to start doing the above as df[:, :a] or df[!, :a] but is there a difference between these? If not, what’s the purpose of having the two different syntaxes?
In the first case, it returns false because separate copies are made while in the second case, it returns true because it’s simply referring to the same column in the same data frame…??
Note, this works for everything except assigning a single value to a non-existent column. That is:
julia> df = DataFrame(a=rand(10));
julia> df.b = rand(10);
julia> df.c = ["blah" for _ in 1:10];
julia> df.c .= "foo";
julia> df.d .= "bar";
ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
Stacktrace:
[1] getindex(::DataFrame, ::typeof(!), ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/other/index.jl:241
[2] getproperty(::DataFrame, ::Symbol) at /Users/ksb/.julia/packages/DataFrames/XuYBH/src/abstractdataframe/abstractdataframe.jl:219
[3] top-level scope at REPL[6]:1