Julia never stops compiling for new recursive Vector types

mnemnion · February 15, 2024, 3:25am

This is… unexpected behavior. I feel like inference should give up at some point.

julia> w = [1]; @time for i = 1:100
         global w = [w]
       end
  0.227547 seconds (466.23 k allocations: 31.348 MiB, 1.52% gc time, 99.41% compilation time)

julia> w = [1]; @time for i = 1:1000
         global w = [w]
       end
  3.160641 seconds (4.20 M allocations: 287.063 MiB, 1.38% gc time, 99.63% compilation time)

julia> w = [1]; @time for i = 1:10000
         global w = [w]
       end
182.490335 seconds (42.05 M allocations: 3.183 GiB, 0.19% gc time, 99.91% compilation time)

julia> v = [1]; @time for i = 1:10000
         global v = Any[v]
       end
  0.000801 seconds (10.00 k allocations: 625.000 KiB)
julia> w  # if you dare....

Printing w will produce a rather remarkable cascading failure, but ^C will get you your prompt back at least.

I recognize that having a recursive Vector/Array type is valuable for some N of recursions, but it gets expensive quick.

oxinabox · February 15, 2024, 3:33am

I don’t see that any other behavior is possible.
This seems as expected. And not particularly problematic.
What is the problem you are seeing?

I feel like inference should give up at some point.

Inference isn’t involved in this process at all?
I think you used the wrong word here.

jling · February 15, 2024, 3:34am

if you mean that you think Julia should stop building fully typed types, then maybe:

julia> w = Any[1]; @time for i = 1:100
         global w = Any[w]
       end
  0.000010 seconds (100 allocations: 6.250 KiB)

julia> w = Any[1]; @time for i = 1:1000
         global w = Any[w]
       end
  0.000161 seconds (1000 allocations: 62.500 KiB)

julia> w = Any[1]; @time for i = 1:10000
         global w = Any[w]
       end
  0.000599 seconds (10.00 k allocations: 625.000 KiB)

mnemnion · February 15, 2024, 4:14am

Would we not call typing a value without a declared type inference? I described it as inferred because the other one is declared. But ok. Works for me, type creation? type synthesis?

It’s hard for me to imagine a circumstance in which this behavior would be helpful. I don’t think it’s a law of nature that Julia has to allocate gigabytes of memory in that loop, I can’t imagine that information being useful to the user or the compiler after some point. What’s that number? At least ten, less than a hundred? Just spitballing here.

A loop which looks O(n) that’s actually O(n²) is unexpected. I posted here rather than filing an issue because it’s not a bug necessarily, but it is the runtime doing an enormous amount of work to no benefit. Would the program in some way be incorrect if it hit, let’s say, 32 deep and switched to Any? That’s a question, mind you. I don’t claim to understand all the implications of Julia’s type system.

I’m more interested in what it’s doing that grows quadratically (looks quad, haven’t tried to quantify this in any way) than anything else. It’s a contrived example of course, but deeply nested data structures are a real thing, and the 1:100 loop takes 0.228 seconds, that’s already really slow.

Oscar_Smith · February 15, 2024, 4:50am

The fundamental problem is that

Would the program in some way be incorrect if it hit, let’s say, 32 deep and switched to Any

is basically impossible (because the “depth” of a type isn’t really a well defined thing). The semantics of [a] is to make an array of eltype a and making that depend on the complexity of the type of a would be really weird and unpredictable semantics.

mnemnion · February 15, 2024, 5:06am

I mean something precise by it: the depth of a type is the number of brace pairs in the printed type. This has to be deducible, by definition, or it couldn’t be printed. I’m not saying that would be cheap, I’m not saying it’s the right thing to do, and I’m not even saying that “type depth” is some sort of correct ontological category. I am saying that “switch to Any when the printing of the type would otherwise wrap a 33rd pair of braces” is a well-defined thing. I’m sure there’s a better way to describe this which doesn’t hinge on the behavior of the show method, but I don’t know what that is.

But that doesn’t touch the real issue: what you describe is O(1), pseudocode vectype(vec) = bindparameter(Vector, eltype(vec)). So why is this operation quadratic then? Shouldn’t it be linear by the depth of recursion? The way allocation is?

mnemnion · February 15, 2024, 5:08am

Is it… is it recursing all the way to the inner vector? But. It knows the type of the outer vector. Can it not just call typeof?

Benny · February 15, 2024, 8:06am

It doesn’t look quadratic to me, but you have too few data points to make an educated guess. As for what it’s doing, this is down to Base.vect(X::T...) for [w] versus getindex(::Type{Any}, @nospecialize vals...) for Any[v]. The code is actually almost identical, so I don’t think it’s the runtime; the other clue is the massive share compilation takes in the vect case.

The compiler currently specializes on a Vararg’s first element’s type and then whether the rest of the elements match or don’t (2 possible specializations). Each iteration the element w has a different type, so each iteration vect is being compiled for it. That doesn’t happen for the getindex case because of the @nospecialize.

Not really, you’re making an arbitrary cutoff after only considering the simple case of a single parameter and a single element, and it’s not the reason for your compilation bloat. Adding the runtime overhead to vect(...) to check every input element’s type for a 33rd brace to possibly switch to getindex(Any, ...) is also a breaking change, so it’s not going to happen. If you want, you could do such a check yourself in the loop, or preferably you just set the element type to Any to begin with to sacrifice element type information for performance, or even more preferably refactor to type-stable code.

abraemer · February 15, 2024, 8:14am

What you see here is not inference time but compile time. Julia compiles a new method of the Vector constructor for you.
If you repeat the same prompt it is fast again:

julia> w = [1]; @time for i = 1:100
                global w = [w]
              end
  0.676614 seconds (466.23 k allocations: 31.348 MiB, 11.15% gc time, 99.40% compilation time)

julia> w = [1]; @time for i = 1:100
                global w = [w]
              end
  0.000124 seconds (100 allocations: 6.250 KiB)

foobar_lv2 · February 15, 2024, 11:26am

The observation about “bad” is correct, though it is almost exponential, not quadratic:

julia> struct A{T}
       x::T
       end

julia> function foo(x0, n)
       b = A(x0)
       while n > 0
       b = A(b)
       n -= 1
       end
       b
       end
foo (generic function with 1 method)

julia> for len in [100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300]
       @show len
       foo(0, len);
       @time foo(0, len+1);
       nothing
       end
len = 100
  0.040797 seconds (1.57 k allocations: 102.184 KiB, 99.60% compilation time)
len = 120
  0.074160 seconds (1.59 k allocations: 102.496 KiB, 99.73% compilation time)
len = 140
  0.124817 seconds (1.61 k allocations: 102.809 KiB, 99.77% compilation time)
len = 160
  0.202185 seconds (1.63 k allocations: 103.121 KiB, 99.80% compilation time)
len = 180
  0.303983 seconds (1.65 k allocations: 103.434 KiB, 99.82% compilation time)
len = 200
  0.445882 seconds (1.67 k allocations: 103.746 KiB, 99.86% compilation time)
len = 220
  0.625526 seconds (1.69 k allocations: 104.059 KiB, 99.88% compilation time)
len = 240
  0.859999 seconds (1.71 k allocations: 104.371 KiB, 99.90% compilation time)
len = 260
  1.167733 seconds (1.73 k allocations: 104.692 KiB, 99.91% compilation time)
len = 280
  1.539299 seconds (1.75 k allocations: 105.005 KiB, 99.92% compilation time)
len = 300
  2.024749 seconds (1.77 k allocations: 112.567 KiB, 99.93% compilation time)

If I now copy-paste the times into an array,

julia> rs = [0.040797, 0.074160, 0.124817, 0.202185, 0.303983, 0.445882, 0.625526, 0.859999, 1.167733, 1.539299, 2.024749];

julia> [rs[i+1]/rs[i] for i=1:(length(rs)-1)]
10-element Vector{Float64}:
 1.8177807191705273
 1.6830771305285868
 1.6198514625411604
 1.5034893785394565
 1.4667991302145185
 1.4028958334267811
 1.374841333533698
 1.3578306486402891
 1.318194313254828
 1.3153708278898382

I think this is well-known here? I.e. compilation time is often exponential in complexity of the type, especially for layouting of data types. It’s one of the reasons one shouldn’t go overboard with complex types or large / nested SVectors.

If this is not well-known, we should may split it off @mbauman

mnemnion · February 15, 2024, 4:33pm

Critical piece of information, good spot.

It’s still pathological behavior, but at least it isn’t in type lookup!

The “shape” of this problem is linear, and the behavior is strongly superlinear. I withdraw the suggestion that the compiler switch to an Any constructor, but this is a genuine performance issue and it would be lovely to see it optimized back to a constant factor, whatever that might take.

It can be worked around, but I want to point out that I didn’t discover this behavior while stress-testing the Julia compiler, but rather, writing benchmarks for one of my packags. Creating deeply-nested data structures is a real-world activity, the real ones are more interesting than a 1 wrapped in a bajillion Vectors, but something in the compiler is forgetting everything it knows each time it makes a constructor, and that’s the sort of problem which can be solved.

mnemnion · February 15, 2024, 4:38pm

It’s a spitball.

julia> w = [1]; @time for i = 1:1000
                global w = [w]
              end
  3.368913 seconds (4.67 M allocations: 318.405 MiB, 1.40% gc time, 99.62% compilation time)

julia> w = [0x01]; @time for i = 1:2000
                global w = [w]
              end
  9.978096 seconds (9.34 M allocations: 645.377 MiB, 0.71% gc time, 99.70% compilation time)

What’s 3^2?

abraemer · February 15, 2024, 4:58pm

That would argue for exponential, no? You essentially say t \propto k^N such that when you plug in 2N you square the time.

But I’d agree that it looks quadratic because a the times are ~3.3s for N=1000 and ~13.3s (you need to add the outputs since its mostly compilation time) for N=2000. So N->2N results in t->4t=t 2^2. So looks quadratic indeed.

mbauman · February 15, 2024, 5:27pm

I mean, maybe don’t do this? What’s the use-case here?

Isn’t this just a case of pathological-program-has-pathological-behaviors?

mnemnion · February 15, 2024, 6:40pm

I would describe this as a stress test, not a pathological program per se.

We’re still guessing what’s actually going on here, but it looks like a classic case of accidentally quadratic behavior. It is nearly always possible to fix accidentally quadratic with no compromises in behavior, that blog is chock-full of examples of this.

A description of the problem could be “Compiling a new constructor is linear to the type depth”, a description of the desired end goal might be “Compiling a type constructor is O(1)”. I’m open to more accurate ways of phrasing this, and as I said in the last post, I no longer think the solution lies in just bailing out and using Any. Identifying the work which is being repeated, and memoizing that work, might do the trick.

A loop creating vectors-in-vectors is a worked example, which shows that the current behavior turns something which I’m sure we all agree looks like a linear algorithm, into one which is quadratic. The worked example is not the problem, it simply exposes it.

Sukera · February 15, 2024, 7:46pm

I’ve found that finding seemingly pathological paths is a good way to find optimization opportunities for the “normal” paths too - if nothing else, it leads to a better understanding of the compilation/working model of the program.

Benny · February 15, 2024, 7:46pm

2 data points is not nearly enough to estimate time complexity, for example 3 to 9 could easily be 3x or x^2 or an infinite number of other functions. You’re also measuring unrepeated compilation for a varying number of types at once based on different primitive types, so the function is ill-posed. The lack of repetition introduces imprecision, but foobar_lv2 made a benchmark that measures 1 call signature compilation 10 times for increasingly nested vector-containing vectors, that’s something to try to fit some curves to. The time complexity may be fundamental rather than accidental here, so be prepared for that as you better measure and examine what the compiler is taking so long on. O(1) seems like an impossible goal, the nested primitive type is not a unique endpoint you can jump to.

mbauman · February 15, 2024, 8:14pm

My point is simply that you’re not on the well-trod path to peak-Julia-performance here. Any time you’re generating thousands of distinct types you’re going to hit compilation bottlenecks. That’s just how Julia works. Much better to just tell Julia to Any[] it and move on.

But, sure, if you’re after improving the compiler — by all means don’t let me get in your way.

Benny · February 15, 2024, 8:21pm

Really any language that compiles and optimizes methods over specific types. In such cases where arbitrarily many input types are generated, perhaps more than they are reused, there’s @nospecialize and @nospecializeinfer to opt out.

Topic		Replies	Views
How does Any actually work? New to Julia question	48	990	December 25, 2024
Understanding and optimizing compiler time (just a bit) General Usage question	21	8409	May 14, 2024
Compiler recursion limit General Usage tuple , compilation , recursion	10	2136	April 17, 2024
Type stability and iterating over operations New to Julia	6	184	January 23, 2025
Regression in v0.7 due to weird inference failure for function with Lisp-y recursion Internals & Design inference	14	1627	March 23, 2018

Julia never stops compiling for new recursive Vector types

Related topics