Internals of types and how they hash

Yes, that makes a lot more sense, occurred to me that one could just do that after I posted. I presume that Base.hash(T::DataType, h) uses the immutable parts of the DataType struct? It would kind of have to. That would let the compiler perform the hash once at compile time, and nothing good could come of the hash of a type changing every time a subtype was added…

Edit: checked the layout of DataType out of curiosity, and it contains a precomputed hash field. So that answers that.

Are there non-immutable parts of the type? (I don’t think a <: Type object is modified when a subtype is added?)

I don’t think this currently happens, since hashing a type does a ccall to jl_type_hash (which uses a cached value for DataType).

It looks like this was finally fixed to be done at compile time just two months ago, whereas in prior Julia versions I think type hashing was still doing a ccall at runtime.

I have a helper function in the REPL for inspecting structs, this is what it returned:

julia> inform(DataType)
mutable struct DataType <: Type{T}
    0 name Core.TypeName
    8 super DataType
    16 parameters Core.SimpleVector
    24 types Core.SimpleVector
    32 instance Any
    40 layout Ptr{Nothing}
    48 hash Int32
    52 flags UInt16
end

So I think types get added to the types vector when they’re created, and it is a mutable struct.

The types vector is not a vector of subtypes, it is a vector of the types of the fields of a struct. For example:

julia> Complex{Int}.types
svec(Int64, Int64)

julia> Complex{Float64}.types
svec(Float64, Float64)

Since abstract types don’t have fields, their types are empty:

julia> Real.types
svec()

(Of course, beware that these are all undocumented implementation internals.)

2 Likes

Good to know, that’s what I get for guessing. I wonder why they’re mutable at all then?

In any case, they have a precomputed hash, so adding that to an instance hash function would be pretty cheap, and whatever mutation might happen presumably wouldn’t change it, I’m sure nothing good could come of the hash of a type changing.

Probably so that a DataType instances can live on the heap and be identified uniquely by a pointer/address.

Instances of an immutable type are identified only by their contents, e.g. if I have an array [3,3,3,3] then there are 4 copies of 3::Int being stored, rather than 4 pointers to a single object. This makes a lot of sense for something like a number object, where there are lots of instances (often small), but it makes less sense for types where the number of instances is relatively small (and typically doesn’t grow much during program execution), while the amount of data in each instance is much larger than a pointer. If I have an array [Int, Int, Int, Int], I don’t want to have 4 copies of the Int::DataType datastructure, I want to have 4 pointers to the same unique Int instance.

Ok, that makes sufficient sense.

Out of curiousity I tried defining a struct and changing the value of the .hash field, which threw an error. So clearly “mutable” is not synonymous with “can or should be mutated” here.

You can also mark individual fields of a struct as const to ensure that they can’t be mutated after construction of the object anymore. E.g. in this struct:

mutable struct Foo
    const a::Int
    b::Int
end

only b can be reassigned once you have an instance of Foo:

julia> f = Foo(1,2)
Foo(1, 2)

julia> f.a = 3
ERROR: setfield!: const field .a of type Foo cannot be changed
Stacktrace:
 [1] setproperty!(x::Foo, f::Symbol, v::Int64)
   @ Base ./Base.jl:53
 [2] top-level scope
   @ REPL[3]:1

julia> f.b = 3
3

julia> f
Foo(1, 3)

For Type, I’d think the reason its cached hash is a constant is because changing that really breaks assumptions about the identity of a type.

2 Likes

I’m glad that got added for sure. DataType is implemented in C, and has probably had a custom setfield! method which throws an error for much longer than const fields in mutable structs have been available (1.8 iirc). There may be some mechanism to inform the compiler about the const-ness of a struct field when the struct is defined in C, but no compelling reason to change DataType code to use that.

Close — if you look at the stack trace, it is a custom setproperty! method:

julia> Int.hash = 7
ERROR: setfield! fields of Types should not be changed
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] setproperty!(x::Type, f::Symbol, v::Int64)
   @ Base ./Base.jl:33
 [3] top-level scope
   @ REPL[62]:1

The setproperty! function is overloadable, whereas setfield! is a low-level “builtin function” that cannot be overloaded. This overload was actually added in Julia 1.7 … in very old versions of Julia you could silently overwrite internal fields of DataType, which presumably would cause bad things to happen. For example, in Julia 1.0, you could do:

julia> Int.types
svec()

julia> Int.types = Core.svec(Int)
svec(Int64)

julia> Int.types
svec(Int64)

Arguably, the custom setproperty! method can now be removed, since the fields of DataType are declared as const. If you now call setfield! directly, you still get an error:

julia> setfield!(Int, :hash, 0)
ERROR: setfield!: const field .hash of type DataType cannot be changed
Stacktrace:
 [1] top-level scope
   @ REPL[66]:1

(But if you try hard enough, you can always break Julia’s internals. Heck, you can write to a random pointer address if you insist.)

Makes sense, the error message has setfield! in it and it didn’t occur to me to check the stacktrace. ¯\_(ツ)_/¯

Good, imho. The goal should be to prevent accidentally ruining the runtime, not to protect the user from their own hubris.

1 Like