Hi, im seeing this two ways to call a column and testing into code cant see any difference, is one better that the other?
thanks.
Hi, im seeing this two ways to call a column and testing into code cant see any difference, is one better that the other?
thanks.
Basically no difference, see Indexing Β· DataFrames.jl
Both of them provide direct access to column
(without copying)
In addition to this, the 2nd syntax can be used for the creation of new columns and for accessing column names stored in a variable, whereas the 1st syntax can only be used to access an existing column which name is known in advance.
I think the first syntax may not work (as far as I know) if the column name has a space (or similar)
It works if you quote the name:
df[!, "my col"] = [1,2]
df."my col"
This isnβt quite right:
julia> using DataFrames
julia> df = DataFrame(a = rand(3))
3Γ1 DataFrame
β Row β a β
β β Float64 β
βββββββΌβββββββββββ€
β 1 β 0.689493 β
β 2 β 0.339562 β
β 3 β 0.707628 β
julia> df.b = rand(3); df
3Γ2 DataFrame
β Row β a β b β
β β Float64 β Float64 β
βββββββΌβββββββββββΌβββββββββββ€
β 1 β 0.689493 β 0.25931 β
β 2 β 0.339562 β 0.76706 β
β 3 β 0.707628 β 0.288513 β
julia> df.c .= 1
ERROR: ArgumentError: column name :c not found in the data frame; existing most similar names are: :a and :b
Stacktrace:
[1] lookupname at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:289 [inlined]
[2] getindex at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\other\index.jl:295 [inlined]
[3] getindex(::DataFrame, ::typeof(!), ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\dataframe\dataframe.jl:435
[4] getproperty(::DataFrame, ::Symbol) at C:\Users\ngudat\.julia\packages\DataFrames\htZzm\src\abstractdataframe\abstractdataframe.jl:346
[5] top-level scope at REPL[13]:100:
As you see, df.b = rand(3)
works despite :b
not being present in the DataFrame. This only fails when doing dot-broadcasting assignment .=
, because as per the link above:
df.col
works likedf[!, col]
[β¦] in all cases except thatdf.col .= v
andsdf.col .= v
perform in-place broadcasting ifcol
is present indf
/sdf
and is a valid identifier.
See https://github.com/JuliaLang/julia/issues/36741 for a discussion about allowing this.
Thanks. I did not know that. Very helpful. (much shorter syntax!)
I actually was not aware that there is some implicit promotion from string to symbol here.
Interesting (I just read the other interesting thread about symbols here When should a function accept a symbol as an argument?)
@nilshg @bkamins
Thanks for the information!
I was always wondering why the df.column syntax has often not worked for me when new columns should be createdβ¦
I was using Symbol(βmy columnβ) to access the columns with spaces. didnβt knew you can also do that.