DataFrames Package: `getindex(df::DataFrame, col_inds::Union{AbstractVector, Regex, Not})` is deprecated

I just started reading Julia for Machine Learning. and I am getting a series of deprecated warning just from the DataFrames package:

The dataset the book uses for this example has to do with the strength of Wi-Fi signals from various devices to four rooms of a house and the CSV file can be obtained here:

The first warning shows up after creating a new variable (RegressionTarget).

using CSV, StatsBase
df = CSV.read("localization.csv", header=false);
w = [5, 10, 15, 20]
df[:RegressionTarget] = Matrix(df[:, [1, 4, 6, 7]]) * w + randn(2000)

Coming from Pandas, at first I am confused as why the column name is not inside the " " but nevertheless, I am getting this warning:

β”Œ Warning: `setindex!(df::DataFrame, v::AbstractVector, col_ind::ColumnIndex)` is deprecated, use `begin
β”‚     df[!, col_ind] = v
β”‚     df
β”‚ end` instead.
β”‚   caller = top-level scope at In[10]:6
β”” @ Core In[10]:6

Similarly, I get similar warning after running this:

X = StatsBase.standardize(ZScoreTransform, map(Float64, Matrix(df[1:7])), dims=2)

β”Œ Warning: `getindex(df::DataFrame, col_inds::Union{AbstractVector, Regex, Not})` is deprecated, use `df[:, col_inds]` instead.
β”‚   caller = top-level scope at In[11]:1
β”” @ Core In[11]:1

Can you explain how i should modify my code to get rid of these warnings?

I am running Julia 1.4 on macOS 10.15

Thanks,

The issue is how you are indexing df. It is now required to supply both row and column indexes when you index with brackets. So instead of saying df[:RegressionTarget] you need df[!,:RegressionTarget] or you can use the shorthand syntax more simply df.RegressionTarget.

Same thing with the second error message, df[1:7] should be df[!,1:7] assuming you want columns 1:7.

2 Likes

Read the warning, it will give you the solution

use `df[:, col_inds]` instead.
2 Likes

Thanks, it fixed the error. Can you tell me what the meaning of : before the RegressionTarget is?
Also what is the difference between indexing by : vs ! : i.e., df[!, [1:7]] vs df[:, [1:7]]

:RegressionTarget is a Symbol, ie equivalent to Symbol("RegressionTarget")

DataFrames.jl docs have more info on ! vs : than I can give you.

1 Like

See the documentation, here.

The ! means that you get the vector without any copying. See the following:

julia> using DataFrames

julia> df = DataFrame(a = [1, 2, 3, 4], b = rand(4));

julia> x = df[!, :a];

julia> x[1] = 100
100

julia> df
4Γ—2 DataFrame
β”‚ Row β”‚ a     β”‚ b        β”‚
β”‚     β”‚ Int64 β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 100   β”‚ 0.40758  β”‚
β”‚ 2   β”‚ 2     β”‚ 0.943078 β”‚
β”‚ 3   β”‚ 3     β”‚ 0.189304 β”‚
β”‚ 4   β”‚ 4     β”‚ 0.434546 β”‚

julia> y = df[:, :b];

julia> y[1] = 100
100

julia> df
4Γ—2 DataFrame
β”‚ Row β”‚ a     β”‚ b        β”‚
β”‚     β”‚ Int64 β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 100   β”‚ 0.40758  β”‚
β”‚ 2   β”‚ 2     β”‚ 0.943078 β”‚
β”‚ 3   β”‚ 3     β”‚ 0.189304 β”‚
β”‚ 4   β”‚ 4     β”‚ 0.434546 β”‚
1 Like

The number of rows, with : you are selecting all the rows.

The usage of memory, you should read https://github.com/bkamins/Julia-DataFrames-Tutorial/, there is all information you could need.

Let me give you a brief overview of some of your questions.

Symbols in Julia i.e. :symbol are interned strings and are a way for the language to represent itself. So whem you type in x = 0, internally there is a symbol created called :x that is bound to the value 0. If you type in eval(:x) you’ll see it evaluate to zero. As to why column indexing in dataframes is done by symbols and not strings is probably internal performance issues.

Next is the notation !. In Julia it is customary to use ! to annotate functions that modify their arguments. For example fill(1.0, 10) allocates a memory for a 10 element array. On the other other hand, fill!(A, x) fills a pre allocated array A with the value x. Note that it’s not a magical keyword; i.e you can’t just expect to add ! to end of function names because the method may not exist.

Keeping in tune with this custom, df[:, :column] simply gets the column for you. On the other hand df[!, :column] = values assigns the array values to the column, and the ! is simply there as a hint to users that something is being modified.

4 Likes

thanks, this really helped me understood this concept