Difference between df.column and df[!, :column]

Hi, im seeing this two ways to call a column and testing into code cant see any difference, is one better that the other?

thanks.

Basically no difference, see https://juliadata.github.io/DataFrames.jl/stable/lib/indexing/

Both of them provide direct access to column (without copying)

2 Likes

In addition to this, the 2nd syntax can be used for the creation of new columns and for accessing column names stored in a variable, whereas the 1st syntax can only be used to access an existing column which name is known in advance.

1 Like

I think the first syntax may not work (as far as I know) if the column name has a space (or similar)

It works if you quote the name:

df[!, "my col"] = [1,2]
df."my col"
2 Likes

This isn’t quite right:

julia> using DataFrames

julia> df = DataFrame(a = rand(3))
3Γ—1 DataFrame
β”‚ Row β”‚ a        β”‚
β”‚     β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.689493 β”‚
β”‚ 2   β”‚ 0.339562 β”‚
β”‚ 3   β”‚ 0.707628 β”‚

julia> df.b = rand(3); df
3Γ—2 DataFrame
β”‚ Row β”‚ a        β”‚ b        β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.689493 β”‚ 0.25931  β”‚
β”‚ 2   β”‚ 0.339562 β”‚ 0.76706  β”‚
β”‚ 3   β”‚ 0.707628 β”‚ 0.288513 β”‚

julia> df.c .= 1
ERROR: ArgumentError: column name :c not found in the data frame; existing most similar names are: :a and :b
Stacktrace:
 [1] lookupname at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:289 [inlined]
 [2] getindex at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:295 [inlined]
 [3] getindex(::DataFrame, ::typeof(!), ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\dataframe\dataframe.jl:435
 [4] getproperty(::DataFrame, ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\abstractdataframe\abstractdataframe.jl:346
 [5] top-level scope at REPL[13]:100:

As you see, df.b = rand(3) works despite :b not being present in the DataFrame. This only fails when doing dot-broadcasting assignment .=, because as per the link above:

df.col works like df[!, col] […] in all cases except that df.col .= v and sdf.col .= v perform in-place broadcasting if col is present in df / sdf and is a valid identifier.

2 Likes

See https://github.com/JuliaLang/julia/issues/36741 for a discussion about allowing this.

2 Likes

Thanks. I did not know that. Very helpful. (much shorter syntax!)
I actually was not aware that there is some implicit promotion from string to symbol here.

Interesting (I just read the other interesting thread about symbols here When should a function accept a symbol as an argument?)

@nilshg @bkamins
Thanks for the information!
I was always wondering why the df.column syntax has often not worked for me when new columns should be created…

I was using Symbol(β€œmy column”) to access the columns with spaces. didn’t knew you can also do that.