Df["col"] syntax

is it possible to have df[“col”] syntax in DataFrames.jl? currently it throws error

df[:, "col"]

Also df.col, but you can’t use a variable for that, i.e. x = "mycolumn"; df.'x'.

I recommend working through the DataFrames documentation for this. See a tutorial here.

2 Likes

i have code which df[“col”] syntax is used, and for me it is more understandable. just wanted to know if this is possible, maybe not!

df[mask1, mask2] is general, the : in my example means take all the rows

No. This was allowed in earlier versions of DataFrames, but people were confused because lots of users saw DataFrames as a collection of rows, in which case df[1] should return the first row, not first column.

As a result, DataFrames.jl enforces an indexing syntax which makes the dimensions clear.

many things to read, i should do that :thinking:

 Base.getindex(df::DataFrame, col::AbstractString) = df[!, col]
1 Like

or more generally:

Base.getindex(df::AbstractDataFrame, col::DataFrames.ColumnIndex) = df[!, col]

and then the same for setindex!, broadcasting, and broadcasting assignment.

However, note, as @pdeffebach commented that it is not supported on purpose. In Julia Base you have the following:

julia> x = [1 2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> x[3]
2

DataFrames.jl is designed to be:

  • consistent with Julia Base (so that once you learn Julia Base you mostly know how to use DataFrames.jl)
  • throw an error when DataFrames.jl decides that what Julia Base does would confuse users - and the case you are asking about is exactly one of these scenarios as I assume when writing:
julia> df = DataFrame(x, :auto)
2×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     3      4

julia> df[3]
ERROR: ArgumentError: syntax df[column] is not supported use df[!, column] instead

you prefer to get an error rather than getting 2 (as this is what you would get for a matrix), as getting 2 would probably be super confusing.

5 Likes