How is `Symbol` special?

Symbol is a mutable struct, just like Expr:

julia> dump(Expr)
mutable struct Expr <: Any
  head::Symbol
  args::Vector{Any}

julia> dump(Symbol)
mutable struct Symbol <: Any

Although it’s not possible to mutate it, as it doesn’t have any fields.

2 Likes

Thanks for this correction, too. I also corrected it in my post above. But why do we have

the field equality in MySymbol when Symbol is a mutable type? Is Symbol the same mutable/immutable hybrid as String? Both types have a lot of similarity, so I guess this would not be too surprising.

Because :a === :a.

They’re certainly both mutable struct, but with mutation not being allowed.

I’ve heard it said that === is special-cased for String. Symbol values are interned, so presumably === doesn’t need special-casing for Symbol. Those are some differences.

2 Likes

Symbol is pretty special. It’s not a type you could implement in pure julia, and it’s not really accurate to think of it as a mutable type (at least from a semantic point of view). It’d be more accurate to say it’s an immutable, pointer-backed type with some other special properties.

You can basically think of it as a String that is never garbage collected, and each Symbol is unique.

That is, there should only ever be one Symbol in the entire julia session of the form :a. Once it’s created, it sits in the julia session forever, and all future Symbols of that form will be a pointer to the same internal string:

julia> pointer_from_objref(:a)
Ptr{Nothing}(0x00007bf28168baf0)

julia> pointer_from_objref(Symbol("a"))
Ptr{Nothing}(0x00007bf28168baf0)

This property is what allows Symbol to be e.g. put into type parameters:

julia> Val(:a)
Val{:a}()

whereas usually a pointer-backed type is not allowed in a type parameter:

julia> Val(Foo(1))
ERROR: TypeError: in Type, in parameter, expected Type, got a value of type Foo
Stacktrace:
 [1] Val(x::Foo)
   @ Base ./essentials.jl:1040
 [2] top-level scope
   @ REPL[13]:1

If you did access the string stored at the pointer for a Symbol and mutated it, you’d probably segfault julia.

3 Likes

That’s just a description of string interning, no?

I think if you define a fieldless mutable struct, and use a separate data structure in the constructor to implement the interning that associates the object identity with a string, you basically reimplemented Symbol.

1 Like

Behold! The impossible type!

julia> let intern_cache = Dict{UInt, Memory{UInt8}}()
            struct MySymbol
                key::UInt
                function MySymbol(s::Memory{UInt8})
                    k = hash(s)
                    if !haskey(intern_cache, k)
                        intern_cache[k] = s
                    end
                    new(k)
                end
            end
            MySymbol(s::String) = MySymbol(Memory{UInt8}(transcode(UInt8, s)))
            
            function Base.show(io::IO, ms::MySymbol)
               print(io, "MySymbol(\"")
               write(io, intern_cache[ms.key])
               print(io, "\")")
           end
       end

julia> ms = MySymbol("foo")
MySymbol("foo")

julia> ms2 = MySymbol("foo")
MySymbol("foo")

julia> ms === ms2
true

julia> Val(ms)
Val{MySymbol("foo")}()

There are a few things that this can’t do that Symbol can, since the interning for Symbol is already done when parsing - that’s of course not possible here, since the intern_cache does not exist at that point! For one thing, the conversion from a String is probably performing an unnecessary copy. As far as the visible semantics are concerned, this should be pretty much identical though :wink:



Of course, it’s important that nothing EVER gets deleted from intern_cache, which is why this thing only lives in the let block:

julia> intern_cache
ERROR: UndefVarError: `intern_cache` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

For Symbol, the julia runtime ensures this. Conceptually it’s the same thing (just with raw pointers for Symbol instead of an integer index); you can in theory mess with the runtime-internal state and “delete” a symbol, but that is likely UB :slight_smile: It’s the same for this implementation, if you delete! an already inserted element you get errors due to the key no longer being found.

4 Likes

I wasn’t talking about interning, that’s of course doable and already exists in some packages, I was talking about the compiler support for how the Compiler treats it an immutable value.

That said,

You’re right, I didn’t consider this strategy where the MySymbol is made to be isbits and then referenced by the global intern dict, very nice!

Actually, I think you probably could emulate pretty much all the other special sauce stuff that Symbol has nowadays using your strategy here, and a macro to make sure the conversions happen at parse time:

julia> macro s_str(s::AbstractString)
           MySymbol(s)
       end;

julia> code_typed() do
           s"boo!"
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     return $(QuoteNode(MySymbol("boo!")))
) => MySymbol

There’s a few other tricks that at Symbol does to be friendly to constant propagation and compile time ops, but I think that now-a-days you could probably implement those using @assume_effects.

3 Likes

Note also that the backing structure does not have to be a Dict! You could also use a Trie combined with some pool for (rare) large values, and using a custom mapping function instead of the default hash. Dropping the requirement of having a stable pointer opens up a world of possibilities!

Yeah, that’s a nice trick to move the allocations out of runtime, good catch!

That’s the one bit I’d be very careful about - I’m not sure it’s valid in general to mark the access in show as @inbounds, for example, since it’s technically possible to delete entries. At the moment, deleting a key merely causes an error somewhere down the road, but with @inbounds (or other more permissive @assume_effects to propagate the data in this cache as a constant) this might turn into proper UB.

Sure, but that’s the same as the UB you’d run into by doing something like

function cause_UB(s::Symbol)
     error("Why on earth did you try and run this function?")
     ptr = pointer_from_objref(s)
     unsafe_strore!(ptr, rand(UInt))
end