is it possible to have df[“col”] syntax in DataFrames.jl? currently it throws error
df[:, "col"]
Also df.col
, but you can’t use a variable for that, i.e. x = "mycolumn"; df.'x'
.
I recommend working through the DataFrames documentation for this. See a tutorial here.
i have code which df[“col”] syntax is used, and for me it is more understandable. just wanted to know if this is possible, maybe not!
df[mask1, mask2]
is general, the :
in my example means take all the rows
No. This was allowed in earlier versions of DataFrames, but people were confused because lots of users saw DataFrames as a collection of rows, in which case df[1]
should return the first row, not first column.
As a result, DataFrames.jl enforces an indexing syntax which makes the dimensions clear.
many things to read, i should do that
Base.getindex(df::DataFrame, col::AbstractString) = df[!, col]
or more generally:
Base.getindex(df::AbstractDataFrame, col::DataFrames.ColumnIndex) = df[!, col]
and then the same for setindex!
, broadcasting, and broadcasting assignment.
However, note, as @pdeffebach commented that it is not supported on purpose. In Julia Base you have the following:
julia> x = [1 2; 3 4]
2×2 Matrix{Int64}:
1 2
3 4
julia> x[3]
2
DataFrames.jl is designed to be:
- consistent with Julia Base (so that once you learn Julia Base you mostly know how to use DataFrames.jl)
- throw an error when DataFrames.jl decides that what Julia Base does would confuse users - and the case you are asking about is exactly one of these scenarios as I assume when writing:
julia> df = DataFrame(x, :auto)
2×2 DataFrame
Row │ x1 x2
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 3 4
julia> df[3]
ERROR: ArgumentError: syntax df[column] is not supported use df[!, column] instead
you prefer to get an error rather than getting 2
(as this is what you would get for a matrix), as getting 2
would probably be super confusing.