When should a function accept a symbol as an argument?

clarkfitzg · July 23, 2020, 7:05pm

Well I take that back, w+ can indeed be a valid symbol:

julia> Symbol("w+")
Symbol("w+")

But nobody wants to write open("somefile.txt", Symbol("w+"))

Henrique_Becker · July 24, 2020, 1:14pm

I have to say that I always assumed that open took a String because that modifier is the same used in the C API, so: (1) taking a string avoids needing to convert before calling the internal C function; (2) someone coming from C will expect it to be a String, so it would be confusing for it to be a Symbol.

Tamas_Papp · July 24, 2020, 2:39pm

Possibly, but I would love to write open("somefile.txt", :read), and similarly something like

string mode	symbol
r	`:read`
r+	`:readwrite`
w	`:truncate`
w+	`:create`
a	`:append`
a+	`:readappend`

Frankly, I always have to look up the modes I do not use commonly. Symbols allow very nice APIs.

mike_k · July 24, 2020, 3:31pm

This probably does not answer your question, but for me it is convenient to use such functions when working with DataFrames especially when I want to plot data from different columns, e.g.,

function myFunction(df::DataFrame, s::Symbol)
x = df[!,s]
# do somthing with x
end

clarkfitzg · July 24, 2020, 5:08pm

Agreed, much like SQL and R.

clarkfitzg · July 24, 2020, 6:06pm

The longer versions are certainly more clear, because they’re more descriptive. Strings can be used the same way, for example, redefine open to accept open("somefile.txt", "read"), which would be equivalent to open("somefile.txt", "r").

What do you mean by nice? Do you mean :read is more aesthetically pleasing than "read"? I agree the symbols are cleaner, as @CameronBieganek originally said above. If, however, this is the only concrete benefit, then I don’t feel it’s worth the cost in additional cognitive load to new users.

A new user can easily understand foo("bar"). It calls function foo with one string argument "bar". Understanding foo(:bar) requires more knowledge specific to Julia, which isn’t nice for the new user .

Passing symbol arguments is a more advanced concept compared to strings literals, which are universal across languages. Why use something that’s more complex when something that’s simpler is not only sufficient, but more general?

I’m all for complexity if there are clear benefits. For example, macros open up a whole world of fun, so they’re well worth the complexity.

StefanKarpinski · July 24, 2020, 6:11pm

Are you aware that you can write open("file.txt", read=true), etc.?

CameronBieganek · July 24, 2020, 6:19pm

One possible advantage of passing symbols instead of strings is that it somewhat reduces the space of possible values, at least if you stick to the quote notation with :. For example, you can pass the string "hello world", but :hello world won’t work. Of course you can do Symbol("hello world"), but users are less likely to try that.

So, I think using symbols might help to conceptually clarify that a one word (or one identifier) input is expected.

clarkfitzg · July 24, 2020, 6:43pm

Backwards compatibility may be another use case for symbols. For example, you want to change the behavior of foo("bar"), so you define a method that does something different for foo(:bar).

Evey · July 24, 2020, 9:34pm

I don’t know about the two cases you listed, but in my experience, the performance considerations are not at all negligible and not simply related to whether string or symbol comparisons are faster. The issue they address are related to constant propagation, which can be especially problematic, when you have branches that involve type instabilities. Such branches can lead to the rest of your code having to allocate a lot and slowing down significantly. However, the compiler is often clever enough to propagate a symbol and eliminate the instability. Here’s a contrived example where such an issue pops in:

f_symbol(x) = f(:real, x)
f_string(x) = f("real", x)

f(s::Symbol, x) = s == :real ? x + 1 : x + 1im
f(s::String, x) = s == "real" ? x + 1 : x + 1im

If we ask the compiler what it thinks these two functions will return, we get the difference between something “good” and something “bad”:

julia> Base.return_types(f_symbol, Tuple{Int64})[]
Int64

julia> Base.return_types(f_string, Tuple{Int64})[]
Union{Complex{Int64}, Int64}

Henrique_Becker · July 24, 2020, 9:40pm

This seems terrible confusing to new and old users alike, if some method takes either Strings or Symbols just as a flag indicator, then I expect them to have the exact same effect.

Maybe it is because has passed some time since I was learning to program the first time, but I fail to see how Symbols will have a significant impact on the cognitive load of new users. The most of the cases a function takes a Symbol it will be a Symbol from a predefined list that is described in the function documentation. The user will be seeing the Symbol syntax at the same time they learn what are the values that can be passed as argument to that parameter.

clarkfitzg · July 24, 2020, 11:26pm

Interesting! Let’s take your example a little further:

f_int(x) = f(0, x)
f_bool(x) = f(true, x)

f(s::Int, x) = s == 0 ? x + 1 : x + 1im
f(s::Bool, x) = s == true ? x + 1 : x + 1im

The compiler propagates constants for literal booleans and integers, as it did for literal symbols:

julia> Base.return_types(f_bool, Tuple{Int64})[]
Int64

julia> Base.return_types(f_int, Tuple{Int64})[]
Int64

Why does the compiler handle literal strings differently?

clarkfitzg · July 24, 2020, 11:47pm

Absolutely! I’m just trying to imagine use cases for symbols.

True, most users probably just take the syntax at face value and do not worry about it.

To really understand the call foo(:bar), one needs to understand what symbols and language objects are, and that functions can operate on them. The large number of votes for the symbol stack overflow question I originally linked to shows that many needed that thorough, lengthy explanation.

OTOH, maybe symbols are a good entry point for starting to learn about metaprogramming, so it’s good if people see them often and are comfortable with them.

Tamas_Papp · July 25, 2020, 5:15am

No, I missed that. Thanks!

Evey · July 25, 2020, 5:29pm

The fact that it is a literal is a red herring. This optimisation will also happen if you use a variable:

const my_real_symbol = :real
f_symbol(x) = f(my_real_symbol, x)

In fact, in trying to construct this counter-example, I found that it will also happen with the string, if we modify the code slightly, and instead check if the strings compare equal with the “triple equality” operator, ===:

f(s::String, x) = s === "real" ? x + 1 : x + 1im

In this case, we now get:

julia> Base.return_types(f_string, Tuple{Int64})[]
Int64

I am actually not entirely sure if there is a way to construct two strings that are equal, but don’t compare equal with ===, so this example might be too simple.

Something symbols can which strings can’t, though, is appear in type parameters. In practice, you sometimes really need to use this fact for expressibility in a type system. For instance:

struct MyType{name} end
is_it_a(::MyType{name}, s) where {name} = s == name

x = MyType{:value}()
is_it_a(x, :number)  # returns false
is_it_a(x, :value)  # returns true

Another example is using Val from Base. You sometimes have situations where the best way to structure your code is something like:

@inline f(s::Symbol, x) = f(Val(s), x)
f(::Val{:real}, x) = x + 1
f(::Val, x) = x + 1im

The reason for writing code like that can either be due to readability, flexibility, or just because the compiler needs that bit of extra help.

So when you encounter an interface where you are asked to give it a symbol and not a string, these kinds of considerations may have been the underlying reason for choosing such an API. In other cases, as has been pointed out by others in this thread, it may just be that people have become so comfortable with symbols in the community that they don’t see them as something that presents an extra level of mental overhead. But it is the case that there are things you can do with symbols that you can’t do with strings. So I do not think you are ever going to convince people that you should never use them as function input. And I think that once you accept that they might sometimes be used, it is better to use them early and often. That way, users will encounter them, wonder what they are, and hopefully learn it one way or the other. Otherwise, it might end up being something esoteric in the language that feels like magic, when it really isn’t.

Evey · July 25, 2020, 6:02pm

Also, in the specific case of Debugger, the relevant lines of code are:

function break_on(states::Vararg{Symbol})
    for state in states
        if state === :error
            break_on_error[] = true
        elseif state === :throw
            break_on_throw[] = true
        else
            throw(ArgumentError(string("unsupported state :", state)))
        end
    end
end

This looks like a case where a design choice was made to use strings out of convention rather than necessity.

clarkfitzg · July 26, 2020, 2:28pm

So constant propagation works the same for strings as for other types if we use ===. The compiler’s behavior surprises me here. I would expect that two strings comparing equal with === (the stronger condition) implies that they compare equal with ==, and that the constant propagation would use this property.

Substrings have this property:

a = "helloworld"
b = "hello"

julia> SubString(a, 1, 5)
"hello"

julia> SubString(a, 1, 5) == b
true

julia> SubString(a, 1, 5) === b
false

YES! As a new user with some background in metaprogramming, when I saw a function that accepts a language object, I expected that the function actually uses the special properties of symbols. This thread has caused me to drop that expectation.

Thanks, these examples illustrate that, even if I don’t yet understand them.

Hmmm, I don’t know, applying this reasoning to every advanced concept seems like it would lead to a complicated language.

Evey · July 26, 2020, 2:54pm

Constant propagation works in both cases, I think it is related to the fact that the implementation of == for two strings is more complicated than === and doesn’t inline. I asked on Slack, and it seems that == and === for Strings are supposed to always return the same result.

I think a heuristic that says that symbols isn’t an advanced concept would get you out of that slippery slope argument.

clarkfitzg · July 26, 2020, 2:56pm

Here is a summary of this thread that answers the original question “When should I write functions that accept symbols instead of strings or booleans?”

If the function actually uses a symbol in a way that it cannot use a string, for example, appearing in a type parameter, then use a symbol. Thanks @Evey

If the function uses the argument to choose one of a handful of behaviors, then either strings or symbols are acceptable. Consider the following when choosing:

Symbols effectively need to be a single identifiers, with syntactic restrictions. For example, open("file.txt", "w+") is possible with strings, but not symbols.
There can be performance differences on the order of 10 ns.
Whether :arg or "arg" is more visually appealing.
Symbols require more knowledge from new users to understand.

Please let me know if I missed any important points.

clarkfitzg · July 26, 2020, 2:59pm

Ha, yes you got me on the slippery slope