Why are self-referential struct parameters impossible?

Neither of these compiles:

struct Tree{V<:Any,T<:Union{Tree,Nothing}}
    value::V
    left::T
    right::T 
end 

struct Tree{V,T} where {V<:Any,T<:Union{Tree,Nothing}}
    value::V 
    left::T 
    right::T
end

It might look like I’m wandering around looking for ways to confuse the compiler, but no, I just tried to write something like this, it was more complex but this illustrates the idea.

I’ve moved on with a different signature, easy enough to preserve the invariant that left/right are either both nothing or both Tree{V} using constructors.

It would be pretty neat to write methods which dispatch on whether it’s a node or a leaf, though. ¯\_(ツ)_/¯

But this is just pure curiosity why this can’t be expressed in the type system. Not worth it? Not useful? Problems for other parts of the type system? NYI? Other?

You should be wary of this design anyway, because then your tree would consist of nodes of different types and your tree-traversal code will require dynamic dispatch. Probably it’s better to put the Union in the field type directly, as in this definition of red–black trees in DataStructures.jl, so that all nodes in a given tree have the same type.

(Worse, with your sort of definition, the root node would encode the entire structure of the tree into its type. Not only would every node have a different type, but every tree structure would have a different type — you’d be trying to get the compiler to re-specialize your code for every distinct tree structure.)

7 Likes

The second struct expression is just invalid syntax, as the {} in the first expression is the implicit where clause. The first expression errors because Tree hasn’t been defined by the point the where clause is evaluated.

EDIT: The good news is that it does exist by the point of evaluating its own fields, and an implicit reference is inserted to get around the unknown fields and to prevent impossible recursive inlining, even with immutable struct. The bad(?) news is that completing a cycle requires mutation.

julia> struct Node
         x::Int
         next::Node
         Node(x::Int) = new(x) # incomplete initialization, tad limited to trailing fields
         Node(x::Int, next::Node) = new(x, next)
       end

julia> n1 = Node(1); n2 = Node(2, n1); n3 = Node(3, n2); sizeof.((n1, n2, n3))
(16, 16, 16)

julia> n3
Node(3, Node(3, Node(2, Node(1, #undef))))

A ::Union{Node, Nothing} and complete initialization new(x, nothing) would be preferable, this is just to demonstrate the #undef reference.

As for composite type definitions dealing with undefined variable issues, it may involve Issue #269, which does have hypothetical resolutions like explicitly incomplete forward declarations or automatic multiple passes within reason.

That’s what I did, but the other one (if it compiled) should mean that, right?

So I ran a quick experiment:

julia> struct TryIt{T} where {T<:Union{Int,Nothing}}
           a::T
           b::T
       end
syntax: invalid type signature around REPL[121]:1

julia> struct TryIt{T<:Union{Int,Nothing}}
           a::T
           b::T
       end

julia> TryIt(1, 1)
TryIt{Int64}(1, 1)

julia> TryIt(nothing,nothing)
TryIt{Nothing}(nothing, nothing)

julia> TryIt(1, nothing)
MethodError: no method matching TryIt(::Int64, ::Nothing)

Closest candidates are:
  TryIt(::T, ::T) where T<:Union{Nothing, Int64}

So at least I understood correctly that it does constrain T to one arm of the Union. This looks type stable? ← actual question, still learning here.

For example, I don’t get why the first line is an invalid type signature, when the error message from the last line spells it that way. I’m just getting the hang of “in the signature” and “in the where clause” for methods, but I clearly don’t get it for structs at all.

But I was going for two things: Assure that the payload {V} type is the same for all nodes, and that the left and right are either nothing, or Tree. I was able to get one of those out of the type system but not both.

Oh, I think I answered at cross purposes to what you were saying. I left out the “payload” field and its parameter, the actual type is like Tree{V} with a value field value::V and left and right are Union{Tree{V},Nothing}. So the entire tree ends up one concrete type, out of a family V, itself a subtype.

Type stability is more about the methods you use on the types, which is why stevengj mentioned “tree-traversal code”. You do have the right intuition that concrete field types, like the several parametric Tryit types you made, could help node-wise code be type-stable. stevengj’s concern is that code traversing a whole tree would still have to contend with those several types, hence type instability. Making several Union{Tree, Nothing} fields in 1 type that makes up a single tree could make for more type-stable tree traversal code, if you’re careful. This sort of code inherently deals with type branches; parameters and type unions let you tweak where those branches go.

Like I said, the {} clause in struct definitions are an implicit where clause. Simple demo:

julia> struct X{as;lhg} end
ERROR: syntax: invalid variable expression in "where" around REPL[10]:1
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1

julia> Vector{T} where {as;lhg}
ERROR: syntax: invalid variable expression in "where" around REPL[11]:1
Stacktrace:
 [1] top-level scope
   @ REPL[11]:1
1 Like

Note that TryIt(::T, ::T) where T<:Union{Nothing, Int64} is not a type signature but a method signature.

You can provide a constructor method that matches the missing signature if you like.

julia> TryIt(::Int, ::Nothing) = TryIt(nothing, nothing)
TryIt

julia> TryIt(1, nothing)
TryIt{Nothing}(nothing, nothing)
1 Like

specifically of the type definition’s default inner constructor, where the method’s argument annotations mirror the type’s field annotations and the method’s explicit where clause mirrors the type’s implicit where clause.

TryIt(1, nothing) was probably intended to throw the MethodError to print the signature, not actually be implemented.

Well yes, that much is clear, but it doesn’t explain the why of any of it. To quote the manual: " UnionAll types are usually written using the keyword where. For example Ptr could be more accurately written as Ptr{T} where T, meaning all values whose type is Ptr{T} for some value of T.". But this syntax is not available for structs. I’m curious why that’s the case.

This… does have the uh, signature. Yes. But this is of course useless with a constructor which doesn’t cheat:

julia> struct TryIt{T<:Union{Int,Nothing}}
           a::T
           b::T
       end

julia> TryIt(a::Int,b::Nothing) = TryIt(a,b)
TryIt

julia> TryIt(1,1)
TryIt{Int64}(1, 1)

julia> TryIt(nothing,nothing)
TryIt{Nothing}(nothing, nothing)

julia> TryIt(1,nothing)
StackOverflowError:

Interestingly, if you provide an internal constructor with the right parameter, the type accepts it:

julia> struct TryIt{T<:Union{Int,Nothing}}
          a::T
          b::T
          TryIt(a::Int, b::Int) = new{Int}(a, b)
          TryIt(a::Nothing, b::Nothing) = new{Nothing}(a, b)
          TryIt(a::Int, b::Nothing) = new{Union{Int,Nothing}}(a, b)
       end

julia> TryIt(1, nothing)
TryIt{Union{Nothing, Int64}}(1, nothing)

Which is fine, because I don’t care about a type where it’s illegal to make a Union variant, but only one where the arms of the Union are properly discriminated. It is somewhat interesting that without explicitly providing the Union variant of T as a constructor, Julia doesn’t create it, and I wonder why that is as well. It’s clear that the subtyping relationship allows it, after all.

Anyway, none of this is germane to the topic, or tags, which is about why a self-referential struct parameter is (as far as I’ve been able to determine) something one can’t represent in the type system.

What the syntax for it would look like is itself an interesting question, but let’s just grant that it’s possible to come up with one.

The full panoply is available thus, which is somewhat more satisfying.

julia> struct TryIt{T<:Union{Int,Nothing}}
          a::T
          b::T
          TryIt(a::U, b::U) where {U} = new{U}(a, b)
          TryIt(a::V, b::W) where {V,W} = new{Union{V,W}}(a, b)
       end

julia> TryIt(1,1)
TryIt{Int64}(1, 1)

julia> TryIt(nothing, 1)
TryIt{Union{Nothing, Int64}}(nothing, 1)

julia> TryIt("a", "b")
TypeError: in TryIt, in T, expected T<:Union{Nothing, Int64}, got Type{String}

Sure it does. It’s just a first-order why. You can always go deeper, but in general the answers often become even less satisfying — and you’ll get answers like “it’s just not implemented as a valid syntax” or “it’s not really a meaningful operation” or “nobody has thought it’d be useful” or maybe even “others have thought it’s probably also confusing that way, too.”

That was also answered to a first-order by stevengj. More less satisfying answers here include things like “the type isn’t defined yet” and “you can with an abstract type” and “it’s still gonna lead to poor performance.”

4 Likes

Respectfully, the first response (which I was responding to) was not an answer to the question at all. Let’s look at the question:

And the response:

This is not answer to why it it’s an invalid type signature, which the independent clause of the sentence. I phrased the dependent clause in terms of the error, yes, so let’s look at that error:

julia> TryIt(1, nothing)
MethodError: no method matching TryIt(::Int64, ::Nothing)

Closest candidates are:
  TryIt(::T, ::T) where T<:Union{Nothing, Int64}

MethodError, clear as day. So yes, I did know, referring to the error message, that it was a method signature. Not to beat up on @uniment here, it’s not like I resent his contribution, but no, it does nothing to address the why of it.

On the other hand, all of yours in quotes, though presented as hypotheticals, are answers to the question posed. It’s difficult for me to read them generously, under the circumstances.

But I didn’t start a topic on why structs don’t take where clauses, although I remain curious. Julia’s type system is worth understanding in detail, at least I think so, but if it arouses the strange hostility I sense every time I ask this sort of question, I’d rather figure it out for myself.

It was a fair reply, and “You should be wary of this design anyway” is a topical reply, sure. It also isn’t an answer to the question, which, not everything has to be. It was informative and responsive to the question, rather than a direct answer, this is ok. I replied to his response as best I could.

Whereas you’ve given three interesting answers! But with a disclaimer that they aren’t satisfying, for some reason, and a general tone of “why are you even asking this” which I don’t understand at all. This is in the Julia internals and design part of the Discourse, it’s a question about the Julia internals and design! If I’d posted it in General asking why my code doesn’t work, the responses I’m getting would make more sense.

But let’s carry on, one at a time:

Agreed, not a satisfying answer. Self-referential constructs are a solved problem. Julia’s functions have no issue with that.

Fascinating! I didn’t know that, is it documented somewhere? How does it work? I’ve never seen this in the wild. Parameterized abstract types, yes, self-referential, no.

This is an interesting and useful claim, yes. I’m sure that’s part of why @stevengj responded as he did, and I feel a need to stress that I have no beef with his answer, for some reason, which is somewhat exhausting.

But now we’re in the counterfactual realm, and my preliminary response here is “not necessarily”. I’m tense at this point that what I’m about to say will be read as criticism, for some reason, but other languages are able to represent recursive types with good performance.

I’m a bit more than halfway through the project with the struct which prompted this line of inquiry, so let me take a pause here and see if I can come up with something which more clearly demonstrates my goal, and why self-referential types appear to be the best way to get good performance and elegant code out of the problem domain.

Maybe that will come with some suggestions on how to improve it in actual, rather than counterfactual, Julia. Maybe that will involve self-referential abstract types! Who knows. Let’s find out.

More later. Tragically, I do have work to do, and posting on the Discourse doesn’t quite count.

1 Like

No, my tone is just “I’m not gonna write a whole essay here.” I figured a quick response would be more useful than nothing at all. Brevity is a virtue, both for the writer and the reader.

So fundamentally, the constraint is what @Benny already mentioned and linked to above — that we don’t have forward declarations for types. But because you’re dealing with subtyping within the parameters, you just need to have an abstract type to reference. This works (but only for perfectly balanced trees):

julia> abstract type AbstractTree; end;

julia> struct Tree{V,T<:Union{AbstractTree,Nothing}} <: AbstractTree
           value::V
           left::T
           right::T
       end

julia> Tree(1, Tree(2, nothing, nothing), Tree(3, nothing, nothing))
Tree{Int64, Tree{Int64, Nothing}}(1, Tree{Int64, Nothing}(2, nothing, nothing), Tree{Int64, Nothing}(3, nothing, nothing))

You can fix that, of course, either by using two different parameters for the two leafs or with a custom constructor as you’ve already discovered. The default constructors try to get a concrete type because that’s generally what you want and is best for performance.

But note that the structure of the tree is reflected both in its type and in its data. We’re not talking about trouble with recursive data, but rather you’re seeking to put the exact structure of that information into the type domain. And that’s where you need to consider the tradeoffs on values vs. types.

3 Likes

It is there, just rearranged to eliminate redundancy. I thought demonstrating the {} clause was an implicit where clause was enough, but maybe a plainer visual is needed to show the connection:

struct Blah{T<:Real, S<:T} # (1) where clause
end
# (2) list all the TypeVar in the where clause: T, S
#   (2)          (1)
Blah{T, S} where {T<:Real, S<:T}

So far in this thread, I’ve only seen people expressing their perspectives on Julia and patiently explaining their reasoning, albeit not to everyone’s satisfactions. Those are well within community standards, so I’d suggest not mistaking that for hostility, which isn’t.

1 Like

It certainly can be, yes. It can also come across as gnomic, dismissive, or worse, not contain a useful amount of information.

Hmm. Well I wrote a whole long thing, but this might solve the motivating issue, lemme check real quick:

julia> Tree(2, nothing, nothing)
Tree{Int64, Nothing}(2, nothing, nothing)

julia>

julia> Tree(2, nothing, nothing)
Tree{Int64, Nothing}(2, nothing, nothing)

julia> sizeof(ans)
8

Great, this will improve things! So as far as pragmatics go, I’m coming out ahead here, thanks, starting to be glad I asked.

Because things like this:

To me this comes across as circular, no pun intended. If I’m asking why there aren’t self-referencing types, this is an answer which restates “we don’t have self-referencing types”. I didn’t have strong expectations in starting this thread as to what answers I’d get, but I figured something like a link to an issue on the subject, because I can’t be the first person to think writing a type parameter containing the type being constructed is a good idea.

What I’m getting from you is that it isn’t necessary because an abstract type can stand in for the concrete type. Ok. I could come around to that point of view. It happens to matter that the child references (and yes, always two or none in this case) be of the same payload parameter (which I can do with an abstract type parameter) and of the same concrete type (which is enforceable with constructors) so that starts to resemble an answer to the question.

The goal was more practical than that, that’s what the whole long thing I’m don’t need to post was about: I’m trying to get rid of the unused pointers by moving the Union into the parameter, while keeping the property of a single UnionAll type, type-stable traversal for the concrete datatype, and dispatch on the leaf/branch node distinction. Which the abstract type does, so hey, thanks again, sincerely.

But I’m still guessing why I can’t just write this:

struct Tree{N<:Number,T<:Union{Tree{N},Nothing}}
   payload::N 
   left::T
   right::T
end

I think it’s ok to bind type parameters one at time, right?

julia> Array{String}
Array{String}

julia> ans{3}
Array{String, 3}

Great, that’s a yes to that part. This would encode the invariant that only concrete Tree types of the same bound parameter, or nothing, could be child types. That’s no bad property to have. This isn’t a dependent type or even anything fancy, it’s an ordinary type constraint which happens to refer to itself. I could stick any Julia type with a free parameter in Union{T{N},Nothing}, except for Tree.

Is it just not worth it, because we can use an abstract as a stand-in? Are there implications of making it possible which I’m not privy to? Was it proposed and rejected for reasons X,Y,Z? Does it, like, break the lattice?

I continue to find these questions interesting. I’m sure there are one or more answers deeper than “you can’t because you can’t”, and more direct then “well, you can’t, but you can do this”. And it’s probably not “no idea, we never thought about this once before you showed up”, either.

I want to stress that I do appreciate what you’ve said here, which is a substantive contribution to the thread, and is going to help me on a practical level as well.

Hah, to me it’s not circular at all! It’s probably the most tangible reason why you can’t construct the thing. At a fundamental level, in order to define a self-referential struct, you’d need to reference the type before it’s fully defined — effectively a forward declaration. And the internals issue link on it was given by Benny in the third post here.

4 Likes

It’s the angle we’re working the question, I suspect. For me, it started with trying to put the type I was constructing in the parameter, and it doesn’t work. This makes it clear enough to me that I can’t forward declare a type, but not why.

The issue linked is the generalization, but (and this is just reasoning by analogy) Lua doesn’t have forward declaration for anything, but it does allow recursive function definitions, like Julia does. I’m glad Julia doesn’t have that restriction for functions at all, it’s not that it’s difficult to code around it with forward declarations of the variables, but it’s nicer not to have to.

The self-reference case is entirely local, the error is thrown while constructing the type with the name. That’s what I meant by calling it a solved problem, it involves checking to see if this is a reference to the type being constructed before throwing the “invalid type parameter name” error and feeding the snake its tail.

This being… I don’t want to say “easy” because compilers are never that, maybe “tractable” is the right word, I had assumed a deeper reason why one can’t. If the type constructor were patched to recognize this case, would everything else work? I suppose there must be some algorithms in the type system which presume they wouldn’t hit a cycle.

Hey y’know the solution to that might be the same as the solution to the linear-time type synthesis thing from earlier… :thinking:

The reason that this works for functions is that Julia tracks the back edges of all functions that a given function calls (this was the solution to automatic recompilation of dependent functions · Issue #265 · JuliaLang/julia · GitHub), and allows for addition of methods even if a function already exists (i.e. not just forward references). It would be possible to do a similar thing for structs, but that would require being able to change the type of a struct which is a much harder problem.

5 Likes

I think the wider reason for global functions being recursive is that they don’t need to know anything about their global variables until they are compiled. The backedges help in the case that the global variable wasn’t defined by the time it’s compiled, which needs to be undone and retried after defining the variable. Locally scoped functions are implemented as callable instances instantiated at definition, so while they can be recursive, they can’t know the type of the function in the recursive call, which really hurts performance. There too are ideas about how to rectify that.

Also I think it’s worth reiterating that this is about parameters constraints specifically. The earlier linked mutually recursive structs issue doesn’t address that directly AFAIK, but the forward declarations idea there could also help the undefined variable error here. Maybe this could be resolved without resolving that issue, as self-recursive structs are already possible:

julia> mutable struct Node1 next::Node1; Node1() = new() end

julia> let x = Node1();  x.next = x  end
Node1(Node1(#= circular reference @-1 =#))

julia> mutable struct Node2{T} next::T; Node2{T}() where T = new() end

julia> let x = Node2{Node2}();  x.next = x  end
Node2{Node2}(Node2{Node2}(#= circular reference @-1 =#))

Looking into Meta.@lower mutable struct Node{T<:Node} next::T end, it seems as if the undefined variable error is thrown by Core.TypeVar(:T, Node) before Node exists. I think Node starts existing at a Core._structtype call that takes the TypeVar as an input, and the result seems mutable based on subsequent calls, including Core._typebody! annotating fields. Unfortunately, I couldn’t begin to tinker because trying that call by itself in the REPL just crashed Julia.

4 Likes

Briefly looking at the issue, I suspect we’ve run into the deepest reason of all: insufficient brain clockcycles allocated to the problem. Maybe you’re the type to have the skills and motivation to help solve it?

Since no instances of the struct have been instantiated yet (as errors would be thrown if you try to instantiate an incompletely defined struct), what do you think would be the hardest part(s)?