Why use unevaluated expression to refer to a column name

I understand that I need to use : operator to represent a variable in an unevaluated expression in Julia. But I am confused as to why I am using : to get column name here - the column name do not appear to be an unevaluated expression.

using RDatasets, MLJ
iris =  dataset("datasets", "iris")
iris2 = coerce(iris, :Species=> OrderedFactor) # This line

Is it possible to refer to a column name here without using unevaluated expression?

Thanks!

: doesn’t create an unevaluated expression, it creates a Symbol, which is a more widely-used type. You can learn more about it here:

help?> Symbol
search: Symbol

  Symbol

  The type of object used to represent identifiers in parsed julia code (ASTs). Also often used as a name or label to
  identify an entity (e.g. as a dictionary key). Symbols can be entered using the : quote operator:

  julia> :name
  :name
  
  julia> typeof(:name)
  Symbol
  
  julia> x = 42
  42
  
  julia> eval(:x)
  42

  Symbols can also be constructed from strings or other values by calling the constructor Symbol(x...).

  Symbols are immutable and should be compared using ===. The implementation re-uses the same object for all Symbols
  with the same name, so comparison tends to be efficient (it can just compare pointers).

  Unlike strings, Symbols are "atomic" or "scalar" entities that do not support iteration over characters.
3 Likes
julia> typeof(:(2+2))
Expr

julia> typeof(:(x))
Symbol

julia > dump(:(2+2))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol +
    2: Int64 2
    3: Int64 2

From the last line is appears Symbol can be a part of expression.

Symbols are replaced with the value they are referring to when evaluated. But why we need the column name to go through parse > evaluation? Why can’t we use a string or iris.Species to denote the column name?

It doesn’t. Symbols are similar to strings–they’re just frequently used as the names of things, even outside of expressions. This is mentioned in the docs linked above:

1 Like

The way I think of it is Strings are for when it matters which characters are in them.

What’s the advantage of using Symbol to refer to a column name?

Strings are often data, while Symbols are not. So you using Symbols makes it easy to know that someone is talking about the name of data.

There are also some slight performance benefits to using Symbols in code in some places.

5 Likes

Isn’t name of the data is known as variable? I am wondering why it doesn’t use something like the following,

coerce(iris, iris.Species => OrderedFactor)

coerce(iris, iris.Species => OrderedFactor)

I’m not familiar with MLJ. But the reason this doesn’t work is because of the types and the way julia evaluate expressions.

iris.Species

is just a vector. There is no name attached to it. It’s totally forgotten that it ever came from a data frame. so the function coerce is given a data frame and then a vector with no name. That’s not a lot to work with.

3 Likes