Should functions that take AbstractStrings also work for Symbols?

I often use Symbols for better efficiency in large DataFrames, but sometimes would still like to filter by calling functions on the value of the Symbols like contains, startswith, endswith, etc. or even length.

Is there a reason why they shouldn’t also be defined for Symbols? Or should AbstractString be extended to include Symbol, since isn’t a Symbol just an interned String?

I think the two are pretty separate concepts, see here for an explanation: Strings vs symbols in DataFrames.jl column indexing | Blog by Bogumił Kamiński

2 Likes

I’m wondering if InlineStrings.jl or StaticStrings.jl would be of use here.

1 Like

I agree that they often have distinct use-cases, but I don’t see any downside to extending the functionality of Symbols to allow operations individual characters of the label. Or am I missing something?

They are very different indeed, I think an even more helpful discussion is Stefan’s answer on SO from ages ago:

Although I guess one might say your point still stands - who cares whether Symbols are also used to represent expressions, couldn’t we still just treat them as Strings?

I’m not a computer scientist and don’t know much about the internals of Symbol and String implementations, but I do know this much:

julia> x = :Something
:Something

julia> x[1]
ERROR: MethodError: no method matching getindex(::Symbol, ::Int64)
Stacktrace:
 [1] top-level scope

julia> y = "Something"
"Something"

julia> y[1]
'S': ASCII/Unicode U+0053 (category Lu: Letter, uppercase)

Fundamentally, Strings are a collection of characters, while Symbols are not, which is why there isn’t even a getindex method for them, which presumably would be the minimum requirement for things like startswith. My gut says that enabling this sort of stuff for Symbols would be a pretty massive change to the language and isn’t going to happen.

I also think Mark is right, you are probably looking for InlineStrings (or even something like CategoricalArrays if you have lots of repeated strings).

8 Likes