Why is converting from a type to a symbol so slow?

I was experimenting with the sort function where I gave to the by keyword a type related to each element of the array, but types can’t be compared with each others. So I tried to convert the type to a symbol, but it was very slow so I tried to benchmark the conversion and I got

julia> using BenchmarkTools

julia> Symbol(Int64)
:Int64

julia> @benchmark Symbol($Int64)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.120 μs …   7.030 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.160 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.230 μs ± 225.560 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂█▆▆▄▂▁     ▂▄▂▂        ▁                          ▁        ▂
  ████████▆▅▃▁█████████▆▆████▇▆▅▃▅▄▅██▇▇▇▅▅▄▄▄▁▄▄▄▁▇██▇▆▅▄▆██ █
  1.12 μs      Histogram: log(frequency) by time      2.09 μs <

 Memory estimate: 224 bytes, allocs estimate: 6.

I would have expected some nanoseconds, but instead it takes more than a microsecond: is there a way to make the conversion faster?

Symbol(x) for generic x falls back to printing x to a string, then converting the string to a symbol. This is going to be slow (allocating a new string on each call, among other things).

If you just want an arbitrary ordering, you can compare by the objectid of the type. For example:

julia> sort(Any[3, 4.2, 6, 5.6, "x", 7, "y"], by=objectid∘typeof)
7-element Vector{Any}:
 3
 6
 7
 4.2
 5.6
  "x"
  "y"

Of course, if you have a mixed-type container, such as an array holding elements of different types, you are inherently looking at sub-par Julia performance to start with.

4 Likes

Note that you can also get the name this way:

julia> T = Int64
Int64

julia> @btime $T.name.name
  101.645 ns (0 allocations: 0 bytes)
:Int64

though with parametric types, it may not be what you want. Also, needless to say, this is not part of the public Julia API and may break…

1 Like

@cstjean What about changing Base with what you propose?

julia> using BenchmarkTools

julia> a = Int64
Int64

julia> @benchmark Symbol($a)
BenchmarkTools.Trial: 10000 samples with 153 evaluations.
 Range (min … max):  681.046 ns …   8.088 μs  ┊ GC (min … max): 0.00% … 89.31%
 Time  (median):     700.654 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   721.068 ns ± 191.343 ns  ┊ GC (mean ± σ):  0.66% ±  2.35%

     ▅█▇▅▃▁▂▃▅▄▃▁      ▁▁▂▁▂▁▁▁                                 ▂
  ▃▂▆████████████▇▇▆███████████▇█▇▆▆▆▆▅▄▅▄▄▄▆▄▄▅▅▄▅▄▃▅▄▅▅▄▅▄▂▅▄ █
  681 ns        Histogram: log(frequency) by time        897 ns <

 Memory estimate: 224 bytes, allocs estimate: 6.

julia> Symbol(x::DataType) = isempty(x.parameters) ? x.name.name : Base.Symbol(x)
Symbol

julia> @benchmark Symbol($a)
BenchmarkTools.Trial: 10000 samples with 979 evaluations.
 Range (min … max):  65.169 ns … 120.225 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     68.233 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   68.597 ns ±   2.725 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         █▁  ▁
  ▁▁▁▁▂▁▁▂▁▂▁▂▂▃▂▂▃▃▇▅▆▇████▇█▄▃▃▂▃▂▂▁▂▂▂▂▂▄▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  65.2 ns         Histogram: frequency by time         72.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Is the idea sound?

One the one hand, it’s pretty painless to add optimized methods to Base, and of course internal methods can depend on internal implementation details.

On the other hand, converting types to symbols is ordinarily something that would only occur during metaprogramming, in which context performance is usually not so much of a concern. If it is performance-sensitive for you, this might be an indication that you should re-think how you are using Julia — the cornerstone of making Julia fast is to make sure that types are inferred at compile-time in critical code, and it seems like you are violating that maxim.

Why would you convert to symbol for sorting. In my understanding, a symbol is an just opaque name and should merely be compared for equality. Was surprised that an ordering relation is defined on symbols at all, but it probably makes sense as nameof returns a symbol (which arguably is not its name).
In any case, if you want to order types by name, just use nameof – which by the way uses t.name.name on data types internally already.

1 Like

nameof may not do what you want for parameterized types, since nameof returns the same symbol for distinct concrete types:

julia> nameof(Complex{Int})
:Complex

julia> nameof(Complex{Float64})
:Complex

julia> Symbol(Complex{Int})
Symbol("Complex{Int64}")

julia> Symbol(Complex{Float64})
:ComplexF64

Notice that Symbol(x), because it falls back to print and therefore show, uses abbreviated aliases like ComplexF64, which may not be what you want for sorting. To omit the type aliases, you need to call show directly with :compact=>false in the IOContext:

julia> sprint(show, ComplexF64; context=:compact=>false)
"Complex{Float64}"

julia> sprint(show, ComplexF64; context=:compact=>true)
"ComplexF64"

i.e. if you wanted to sort by the full type names, you could use by=x -> sprint(show, typeof(x); context=:compact=>false).

Of course, since this involves allocating a string for every element to be sorted (same as Symbol(T) under the hood), it won’t be blazingly fast. But, as I commented above, if you need sorting by type name to be blazingly fast then you might want to re-think your whole approach.

yes, I know that heterogeneous containers are not very performant and indeed I think that the current approach is not perfect because it is too type unstable but I thought that this conversion could have been useful in some other context, but I agree that it could be used in metaprogramming for the most part where performance is not very important, nonetheless if it possible to have an optimized version even for parametric types than I would consider improving the current Base method, though it’s maybe impossible to keep the current behaviour if Symbol(Complex{Float64}) returns an alias