I understand that I need to use : operator to represent a variable in an unevaluated expression in Julia. But I am confused as to why I am using : to get column name here - the column name do not appear to be an unevaluated expression.
using RDatasets, MLJ
iris = dataset("datasets", "iris")
iris2 = coerce(iris, :Species=> OrderedFactor) # This line
Is it possible to refer to a column name here without using unevaluated expression?
: doesn’t create an unevaluated expression, it creates a Symbol, which is a more widely-used type. You can learn more about it here:
help?> Symbol
search: Symbol
Symbol
The type of object used to represent identifiers in parsed julia code (ASTs). Also often used as a name or label to
identify an entity (e.g. as a dictionary key). Symbols can be entered using the : quote operator:
julia> :name
:name
julia> typeof(:name)
Symbol
julia> x = 42
42
julia> eval(:x)
42
Symbols can also be constructed from strings or other values by calling the constructor Symbol(x...).
Symbols are immutable and should be compared using ===. The implementation re-uses the same object for all Symbols
with the same name, so comparison tends to be efficient (it can just compare pointers).
Unlike strings, Symbols are "atomic" or "scalar" entities that do not support iteration over characters.
julia> typeof(:(2+2))
Expr
julia> typeof(:(x))
Symbol
julia > dump(:(2+2))
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Int64 2
3: Int64 2
From the last line is appears Symbol can be a part of expression.
Symbols are replaced with the value they are referring to when evaluated. But why we need the column name to go through parse > evaluation? Why can’t we use a string or iris.Species to denote the column name?
It doesn’t. Symbols are similar to strings–they’re just frequently used as the names of things, even outside of expressions. This is mentioned in the docs linked above:
I’m not familiar with MLJ. But the reason this doesn’t work is because of the types and the way julia evaluate expressions.
iris.Species
is just a vector. There is no name attached to it. It’s totally forgotten that it ever came from a data frame. so the function coerce is given a data frame and then a vector with no name. That’s not a lot to work with.