When should a function accept a symbol as an argument?

I stumbled across two situations where functions accept symbols as arguments, and wonder why they are designed that way. What advantages are there to using symbols? When should I write functions that accept symbols instead of strings or booleans? Here are the examples:

  1. In the Debugger, to break on error:
Debugger.break_on(:error)

Why not write break_on to accept strings, for example, Debugger.break_on("error")?

  1. To get the labels for the MNIST test data set:
Flux.Data.MNIST.labels(:test)

The labels can be either test or train, so why not use something like Flux.Data.MNIST.labels(test = true)?

@StefanKarpinski’s answer to “what is a Symbol?” answers this question for DataFrames, but I don’t see how the reasoning applies in the cases above. The docs on metaprogramming explain what is happening, and some of the cool things you can do with it, but not why you would want to for these use cases.

5 Likes

It might primarily be an aesthetic preference. I find :error to be slightly less visually noisy than "error". Text editors also often color symbols and strings different colors:

image

Symbols are not the same as Enums, of course, but I tend to think of them as better than strings when you want a lightweight enum, like :yes, :no, :maybe.

2 Likes

Symbols are similar to a global enumeration. If creating an enumeration just for that case is overkill, and the symbols are few and short, then using Symbol is often a sensible choice. Using a string would work too, but comparison of Strings is costlier, I think?

2 Likes

It seems so (although the numbers are so small that I may be measuring something else):

julia> @btime "abcd" == "abcd"
  5.632 ns (0 allocations: 0 bytes)
true

julia> @btime :abcd == :abcd
  0.025 ns (0 allocations: 0 bytes)
true

julia> @btime "abcd" == "fghi"
  7.237 ns (0 allocations: 0 bytes)
false

julia> @btime :abcd == :fghi
  0.025 ns (0 allocations: 0 bytes)
false

Regarding binary inputs (true/false), I don’t think that Symbols may compete with Booleans, so in that case I suppose that the reason to choose them is usability (no need to remember if true refers to train or test).

1 Like

I think for most code that uses symbols or strings for mode selection, whatever difference in performance there might be between strings and symbols is negligible. For example, in this code

function foo(x; mode=:a)
    if mode == :a
        foo_a(x)
    else
        foo_b(x)
    end
end

the cost of the mode == :a comparison is most likely negligible compared to the cost of running foo_a or foo_b.

Yes, you are correct (except, maybe, if the code would allocate nothing if you used symbols, and now it has to allocate strings for each comparison inside a loop or something like that).

However, often, there is no good reason to use a String in place of Symbol for such cases, and the balance ends up tipping to the Symbol side. Even if is just because experienced programmers expect it to be a Symbol.

I get considerably different values if I do not interpolate, but in any case, it seems that comparing symbols is indeed cheaper.

julia> a = "0123456789"
"0123456789"

julia> b = "0123456789"
"0123456789"

julia> using BenchmarkTools

julia> s_a = :0123456789
123456789

julia> s_b = :0123456789
123456789

julia> @btime a == b
  15.003 ns (0 allocations: 0 bytes)
true

julia> @btime s_a == s_b
  11.712 ns (0 allocations: 0 bytes)
true

julia> @btime $a == $b
  2.658 ns (0 allocations: 0 bytes)
true

julia> @btime $s_a == $s_b
  0.016 ns (0 allocations: 0 bytes)
true

:0123456789 returns an integer, not a symbol. :slight_smile:

1 Like

Argh, betrayal.

Apparently, I cannot interpolate Symbols with @btime?

julia> s_a = Symbol("A0123456789")
:A0123456789

julia> s_b = Symbol("A0123456789")
:A0123456789

julia> using BenchmarkTools

julia> @btime $s_a == $s_b
ERROR: UndefVarError: A0123456789 not defined
[...]

It might primarily be an aesthetic preference.

This is what I suspected.

Using expressions does put more cognitive load on new users, as it’s another bit of syntax and concept to understand. It’s also a bit intimidating, because it leads one down the path of metaprogramming.

If using a symbol only gets us a slightly cleaner style, I don’t feel it’s worth the additional complexity over a string.

2 Likes
julia> s_a = Symbol("0123456789")
Symbol("0123456789")

julia> s_b = Symbol("0123456789")
Symbol("0123456789")

julia> @benchmark $(QuoteNode(s_a)) == $(QuoteNode(s_b))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.016 ns (0.00% GC)
  median time:      0.019 ns (0.00% GC)
  mean time:        0.019 ns (0.00% GC)
  maximum time:     0.028 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

FWIW, sub-ns times generally mean the compiler defeated the benchmark.

3 Likes

I believe you need the Ref trick:

julia> @benchmark $(Ref(QuoteNode(s_a)))[] == $(Ref(QuoteNode(s_b)))[]
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.906 ns (0.00% GC)
  median time:      15.678 ns (0.00% GC)
  mean time:        16.885 ns (0.00% GC)
  maximum time:     66.014 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
2 Likes

Strangely (to me) strings seem to compare faster:

julia> s1 = "hello"
"hello"

julia> s2 = "hello"
"hello"

julia> @benchmark $(Ref(s1))[] == $(Ref(s2))[]
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.783 ns (0.00% GC)
  median time:      8.304 ns (0.00% GC)
  mean time:        8.162 ns (0.00% GC)
  maximum time:     38.303 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999
2 Likes

FWIW, in my code I always use === for symbols. This is faster than Strings for me:

julia> s_a = Symbol("0123456789")
Symbol("0123456789")

julia> s_b = Symbol("0123456789")
Symbol("0123456789")

julia> @benchmark $(Ref(QuoteNode(s_a)))[] == $(Ref(QuoteNode(s_b)))[]
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.634 ns (0.00% GC)
  median time:      12.863 ns (0.00% GC)
  mean time:        12.908 ns (0.00% GC)
  maximum time:     36.339 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark $(Ref(QuoteNode(s_a)))[] === $(Ref(QuoteNode(s_b)))[]
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.840 ns (0.00% GC)
  median time:      3.059 ns (0.00% GC)
  mean time:        3.070 ns (0.00% GC)
  maximum time:     15.963 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @btime $(Ref(QuoteNode(s_a)))[] === $(Ref(QuoteNode(s_b)))[]
  2.626 ns (0 allocations: 0 bytes)
true

julia> s1 = "hello"
"hello"

julia> s2 = "hello"
"hello"

julia> @benchmark $(Ref(s1))[] == $(Ref(s2))[]
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.363 ns (0.00% GC)
  median time:      4.369 ns (0.00% GC)
  mean time:        4.386 ns (0.00% GC)
  maximum time:     26.813 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
3 Likes

According to the documentation a symbol is an “interned string”, meaning only one copy of it is stored. I think this means that when code containing a symbol is compiled, the symbol is effectively replaced by its unique id (i.e. a pointer). This makes storing, sharing, and comparing symbols efficient because they are essentially just addresses, whereas a string is a sequence of an arbitrary number of bytes.

In my experience, symbols are often used (as @CameronBieganek suggested) as quick-and-easy categorical values.

3 Likes

Some comments (that are the consequences of what was already said):

  1. Symbols are not deallocated in a single Julia session which means that they make sense if there is only a limited number of them (for ifs it is the case)
  2. Symbol is always treated as a whole (so string operation functions do not work on Symbols - you have to convert them to string first), which sometimes is a limitation
  3. Symbols that are not valid identifiers are relatively cumbersome to spell out (which matters in interactive use)

E.g. consideration of points 2 and 3 made us start accepting strings as column names in DataFrames.jl.

4 Likes

Accessing struct and NamedTuple fields is possible and fast (inlined) by symbols, which may open up optimization possibilities that are unforeseeable at API design time. Also, Dict{Symbol, Any} is faster than Dict{String, Any}, while converting a String to a Symbol is relatively slow.

As those optimization possibilities cannot always be seen at API design time, I generally find it better to use symbols for mode selection.

4 Likes

I have to say that for me, the profit I got from Dataframes.jl accepting strings as column names, is that I can use Symbols with the special meaning given by the macros of DataFramesMeta.jl (i.e., the column itself), but also have a simple way to refer to the column names without escaping the symbols inside the same expression.

1 Like

Thanks all for the in depth responses. It sounds that using symbols versus strings may have some performance impact on the order of 10ns, which may matter for some applications. This impact comes from implementation details such as inlining, allocations, and string interning.

As a naive new user, symbols appear unusual and noteworthy, which is why I asked the original question. There are certainly plenty of examples that use strings over symbols, for example:

open("somefile.txt", "w+")

Opening a file like this is a familiar pattern across many languages, and it would be noteworthy indeed if Julia did something like:

open("somefile.txt", :w+)

The above doesn’t work, because w+ is not a symbol. It appears it’s not even possible to express w+ with an expression, because w+ is a symbol with a binary operator, which isn’t a complete expression.

Hence symbols have syntactic restrictions that strings do not, which is another disadvantage.

1 Like