Thanks for this correction, too. I also corrected it in my post above. But why do we have
the field equality in MySymbol when Symbol is a mutable type? Is Symbol the same mutable/immutable hybrid as String? Both types have a lot of similarity, so I guess this would not be too surprising.
They’re certainly both mutable struct, but with mutation not being allowed.
I’ve heard it said that === is special-cased for String. Symbol values are interned, so presumably === doesn’t need special-casing for Symbol. Those are some differences.
Symbol is pretty special. It’s not a type you could implement in pure julia, and it’s not really accurate to think of it as a mutable type (at least from a semantic point of view). It’d be more accurate to say it’s an immutable, pointer-backed type with some other special properties.
You can basically think of it as a String that is never garbage collected, and each Symbol is unique.
That is, there should only ever be one Symbol in the entire julia session of the form :a. Once it’s created, it sits in the julia session forever, and all future Symbols of that form will be a pointer to the same internal string:
This property is what allows Symbol to be e.g. put into type parameters:
julia> Val(:a)
Val{:a}()
whereas usually a pointer-backed type is not allowed in a type parameter:
julia> Val(Foo(1))
ERROR: TypeError: in Type, in parameter, expected Type, got a value of type Foo
Stacktrace:
[1] Val(x::Foo)
@ Base ./essentials.jl:1040
[2] top-level scope
@ REPL[13]:1
If you did access the string stored at the pointer for a Symbol and mutated it, you’d probably segfault julia.
That’s just a description of string interning, no?
I think if you define a fieldless mutable struct, and use a separate data structure in the constructor to implement the interning that associates the object identity with a string, you basically reimplemented Symbol.
julia> let intern_cache = Dict{UInt, Memory{UInt8}}()
struct MySymbol
key::UInt
function MySymbol(s::Memory{UInt8})
k = hash(s)
if !haskey(intern_cache, k)
intern_cache[k] = s
end
new(k)
end
end
MySymbol(s::String) = MySymbol(Memory{UInt8}(transcode(UInt8, s)))
function Base.show(io::IO, ms::MySymbol)
print(io, "MySymbol(\"")
write(io, intern_cache[ms.key])
print(io, "\")")
end
end
julia> ms = MySymbol("foo")
MySymbol("foo")
julia> ms2 = MySymbol("foo")
MySymbol("foo")
julia> ms === ms2
true
julia> Val(ms)
Val{MySymbol("foo")}()
There are a few things that this can’t do that Symbol can, since the interning for Symbol is already done when parsing - that’s of course not possible here, since the intern_cache does not exist at that point! For one thing, the conversion from a String is probably performing an unnecessary copy. As far as the visible semantics are concerned, this should be pretty much identical though
Of course, it’s important that nothing EVER gets deleted from intern_cache, which is why this thing only lives in the let block:
julia> intern_cache
ERROR: UndefVarError: `intern_cache` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
For Symbol, the julia runtime ensures this. Conceptually it’s the same thing (just with raw pointers for Symbol instead of an integer index); you can in theory mess with the runtime-internal state and “delete” a symbol, but that is likely UB It’s the same for this implementation, if you delete! an already inserted element you get errors due to the key no longer being found.
I wasn’t talking about interning, that’s of course doable and already exists in some packages, I was talking about the compiler support for how the Compiler treats it an immutable value.
That said,
You’re right, I didn’t consider this strategy where the MySymbol is made to be isbits and then referenced by the global intern dict, very nice!
Actually, I think you probably could emulate pretty much all the other special sauce stuff that Symbol has nowadays using your strategy here, and a macro to make sure the conversions happen at parse time:
julia> macro s_str(s::AbstractString)
MySymbol(s)
end;
julia> code_typed() do
s"boo!"
end
1-element Vector{Any}:
CodeInfo(
1 ─ return $(QuoteNode(MySymbol("boo!")))
) => MySymbol
There’s a few other tricks that at Symbol does to be friendly to constant propagation and compile time ops, but I think that now-a-days you could probably implement those using @assume_effects.
Note also that the backing structure does not have to be a Dict! You could also use a Trie combined with some pool for (rare) large values, and using a custom mapping function instead of the default hash. Dropping the requirement of having a stable pointer opens up a world of possibilities!
Yeah, that’s a nice trick to move the allocations out of runtime, good catch!
That’s the one bit I’d be very careful about - I’m not sure it’s valid in general to mark the access in show as @inbounds, for example, since it’s technically possible to delete entries. At the moment, deleting a key merely causes an error somewhere down the road, but with @inbounds (or other more permissive @assume_effects to propagate the data in this cache as a constant) this might turn into proper UB.