Difference between df.column and df[!, :column]

Hi, im seeing this two ways to call a column and testing into code cant see any difference, is one better that the other?

thanks.

Basically no difference, see Indexing Β· DataFrames.jl

Both of them provide direct access to column (without copying)

2 Likes

In addition to this, the 2nd syntax can be used for the creation of new columns and for accessing column names stored in a variable, whereas the 1st syntax can only be used to access an existing column which name is known in advance.

1 Like

I think the first syntax may not work (as far as I know) if the column name has a space (or similar)

It works if you quote the name:

df[!, "my col"] = [1,2]
df."my col"
2 Likes

This isn’t quite right:

julia> using DataFrames

julia> df = DataFrame(a = rand(3))
3Γ—1 DataFrame
β”‚ Row β”‚ a        β”‚
β”‚     β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.689493 β”‚
β”‚ 2   β”‚ 0.339562 β”‚
β”‚ 3   β”‚ 0.707628 β”‚

julia> df.b = rand(3); df
3Γ—2 DataFrame
β”‚ Row β”‚ a        β”‚ b        β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.689493 β”‚ 0.25931  β”‚
β”‚ 2   β”‚ 0.339562 β”‚ 0.76706  β”‚
β”‚ 3   β”‚ 0.707628 β”‚ 0.288513 β”‚

julia> df.c .= 1
ERROR: ArgumentError: column name :c not found in the data frame; existing most similar names are: :a and :b
Stacktrace:
 [1] lookupname at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:289 [inlined]
 [2] getindex at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:295 [inlined]
 [3] getindex(::DataFrame, ::typeof(!), ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\dataframe\dataframe.jl:435
 [4] getproperty(::DataFrame, ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\abstractdataframe\abstractdataframe.jl:346
 [5] top-level scope at REPL[13]:100:

As you see, df.b = rand(3) works despite :b not being present in the DataFrame. This only fails when doing dot-broadcasting assignment .=, because as per the link above:

df.col works like df[!, col] […] in all cases except that df.col .= v and sdf.col .= v perform in-place broadcasting if col is present in df / sdf and is a valid identifier.

3 Likes

See https://github.com/JuliaLang/julia/issues/36741 for a discussion about allowing this.

2 Likes

Thanks. I did not know that. Very helpful. (much shorter syntax!)
I actually was not aware that there is some implicit promotion from string to symbol here.

Interesting (I just read the other interesting thread about symbols here When should a function accept a symbol as an argument?)

@nilshg @bkamins
Thanks for the information!
I was always wondering why the df.column syntax has often not worked for me when new columns should be created…

I was using Symbol(β€œmy column”) to access the columns with spaces. didn’t knew you can also do that.