Any at UnitRange seems unoptimal

Marcell_Havlik · October 13, 2022, 9:05am

Hey Julianners!

One of the most used struct UnitRange.
I feel something is odd here. Why we have Any[Core.Const(1), Int64]? This sounds pretty unoptimal.

using InteractiveUtils
unitrange_any(N) = begin
	c=1:N
end

unitrange_any(6)
@code_warntype unitrange_any(6)

MethodInstance for unitrange_any(::Int64)
  from unitrange_any(N) in Main at /home/hm/repo/agei/tests/playground/core.partialstruct.jl:2
Arguments
  #self#::Core.Const(unitrange_any)
  N::Int64
Locals
  c::UnitRange{Int64}
Body::UnitRange{Int64}
1 ─ %1 = (1:N)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│        (c = %1)
└──      return %1

Shouldn’t we use Tuple{Core.Const, DataType}? Or due to Core.Const(1) the number 1 cannot be represented enough well and we would lose some performance due to lack of optimisation for that case?
So why we use Any[] arrays? (I know the two type differs… but this is the case when we should use Tuple where it can keep all of it.

Specifically for UnitRange we know that, there will be about 1 or 2 parameter so Tuple should be pretty straightforward for me.

Sukera · October 13, 2022, 9:36am

Core.PartialStruct is not a special construct for UnitRange - it’s used in lots of different cases. If it would use a Tuple internally, code using it in the compiler would have to be recompiled for different instances. So using a Vector{Any} here is a hidden optimization in terms of compile time for the general case.

It’s also not related to anything happening at runtime here.

Marcell_Havlik · October 13, 2022, 9:39am

I think Vector{Any} won’t recompile as actually there is nothing to optimise when the internals are Any.

But taking it seriously, I have to modify the question and throw the noise of PartialStruct away, it isn’t the point here indeed. (I will modify the question, so the function name is unitrange_any from now.)

Sukera · October 13, 2022, 9:50am

The reason you get a Core.PartialStruct in your output in the first place is because @code_warntype does not propagate the 6, only that it’s an Int, which is sufficient to infer the return type of UnitRane{Int64}. This is not sufficient to create a full UnitRange object though, and thus Core.PartialStruct is created to at least be able to propagate the hardcoded 1 part. So there’s nothing unoptimal about this here.

Marcell_Havlik · October 13, 2022, 9:58am

So, I totally understand why ther is PartialStruct there. I am not talking about weather it can generate the function knowing the 6.
The question is: “Why do we have Array: (Any[Core.Const(1), Int64]) instead of Tuple: ((Core.Const(1), Int64)), which looks much better from the compiler side?”

Sukera · October 13, 2022, 10:09am

Because it’s not better from the compiler side. Since a tuple has the types of its elements as part of its own type, any code accessing this would then need to be specialized for that tuple type, leading to lots and lots of recompilation of functions internal to the compiler. You could of course only compile one version for Tuple without specializing on the element types, but that is no better than Vector{Any}, which is much more convenient to work with in terms of adding & removing elements.

Again, using Vector{Any} here is an optimization. Core.PartialStruct may be used for lots of other things, which can contain a number of different types. Avoiding multiple code paths inside the compiler is advantageous in terms of the amount of compile time happening.

Marcell_Havlik · October 13, 2022, 11:58am

You are basically stating, it is better to drop the type information and compile it with Any, so we can spare compilation time.

LLVM can build like 250 time faster code with type information… This sounds really bad idea from my opinion.

Also like 99.9% of the case in my code I use Core.Const(1), Int64 and it would be like so for others in general. So I wouldn’t even bother with that 0.1% if I could make sure I get the optimisation perfectly.

Elrod · October 13, 2022, 12:38pm

The Vector{Any} only exists at compile time, for use by the compiler. LLVM knows all types at compile time here. You can take a look at the @code_llvm if it helps.

Sukera · October 13, 2022, 1:14pm

I’m saying that we compile Core.PartialStruct and the functions inside the compiler with Vector{Any}. I’m NOT saying that your user code will compile with Vector{Any}. Those are two very different things. That’s also why I pointed out that seeing the Core.PartialStruct here in the first place is misleading, because it’s not something that’s actually relevant for your unitrange_any, either in terms of runtime performance or in terms of compile time (and in fact helps make compile time faster, due to not having to recompile the compiler itself, which would be required if that were a Tuple).

Marcell_Havlik · October 13, 2022, 1:21pm

Then the question remains.
Doesn’t that Any[Core.Const(1), Int64] cause run time slow downs on the long run? (I don’t want to spare compilation time on my performant code in most of the cases.)

Marcell_Havlik · October 13, 2022, 1:24pm

Actually it is impossible to know for the llvm. So I am not sure. [Core.Const(1), Int64] is an ambigious type, that is why it is Any.

Sukera · October 13, 2022, 1:33pm

No. That array does not exist at runtime, only during compilation, inside the compiler. It does not exist in your user code.

Using Tuple here instead would NOT increase performance at run time, it would only increase the duration taken to compile your code. As such, Vector{Any} here is an optimization to have faster compilation in general, WITHOUT impacting runtime performance.

What Chris was trying to say is that this array does not exist when your unitrange_any function ultimately runs. That array only exists during compilation, not when your function is executing.

Marcell_Havlik · October 13, 2022, 2:26pm

So basically the compiler knows the type of the array[1] and array[2] even if we “deleted” the type by addressing into a heterogeous array?
I was always using tuples in my code in these scenarios as I saw it keep the type and the type is known at compile time.

Marcell_Havlik · October 13, 2022, 2:28pm

Ok… I am already too much I know! But I just had to test what you said.

It says it just doesn’t quess the type if the array is heterogenous.
Please help me out what do I do wrong and how does it know the type later on?

using InteractiveUtils
heterogen_arr_type_stability(x) = begin
  x[2]
end

@code_warntype heterogen_arr_type_stability([Core.Const(2), 6])

Res:

MethodInstance for heterogen_arr_type_stability(::Vector{Any})
  from heterogen_arr_type_stability(x) in Main at /home/hm/repo/agei/tests/playground/core.partialstruct.jl:13
Arguments
  #self#::Core.Const(heterogen_arr_type_stability)
  x::Vector{Any}
Body::Any
1 ─ %1 = Base.getindex(x, 2)::Any
└──      return %1

Marcell_Havlik · October 13, 2022, 2:38pm

Sidenote: I just wanted to check how does it work?
@edit Core.PartialStruct(UnitRange{Int64}(1,3), Any[Core.Const(1), Int64])
res:
PartialStruct(@nospecialize(typ), fields::Array{Any, 1}) = $(Expr(:new, :PartialStruct, :typ, :fields))
cant we use @nospecialize to don’t specialize on that type and use tuple? So we could remove billions of Any from everyones code without any serious change? I know I am beginner at julia core… so sorry for the beginner question.

albheim · October 13, 2022, 2:40pm

I don’t think the claim ever was that it could infer the type from the Any array (though I should say I don’t know what I’m talking about, just reading the posts here), but that array just contains information for the compiler to generate code. More precisely the array seems to contain the types that the UnitRange should be compiled for (Core.Const(1) and Int64) which can be used in the compiler to generate code for this case. Here the case is that you have a constant 1 as the start and an Int64 as the end, which are then used by the compiler to generate fast code based on those types.

Sukera · October 13, 2022, 2:40pm

Every element in an Vector{Any} has its own type bundled at runtime. No type is ever “deleted”.

That’s a different scenario. The Vector{Any} you’re seeing as part of the Core.PartialStruct is an object internal to the compiler, where this is intended. When you swap that object to a Tuple instead, the compiler has to recompile its own internal functions it uses for compilation. This is what using Vector{Any} saves us from here. I’m not talking about your user function.

That is perfectly fine and correct. As I said, in a Vector{Any}, every object has its type bundled with its data, a so-called type tag. But again, what you’re observing in the Core.PartialStruct has nothing to do with your code. It’s a measure done so that compiler internals don’t have to recompile themselves while compiling your function.

Sukera · October 13, 2022, 2:55pm

No - again, there is no place in your unitrange_any where only Any could be inferred. The Vector{Any} and the Core.PartialStruct are a compiler artifact. It is not related to your specific user code (or any type stable user code for that matter). No Core.PartialStruct object will be created for your unitrange_any at runtime, because when you actually call unitrange_any with a value, everything exists so that the UnitRange object can be created like any other object. Core.PartialStruct is just a compiler optimization to allow some additional type propagation during inference. It is not related to any runtime object coming into existence or being allocated.

Marcell_Havlik · October 13, 2022, 3:09pm

Ok, I was checking my @code_warntype where there were like 50-60 Core.PartialStruct in one function and that Any[... ] was really disturbing as why it wasn’t Tuple… but ok. I of course understand these.

After all these discussion, I understand there is no performance improvement, for me Tuple with @nospecialize is just a nicer way and could avoid the Any[] from the process while keeping all the things in the same way. But you can demolish this idea too if you think we would lose something with that Tuple thing.

Sukera · October 13, 2022, 3:21pm

Tuple is not a magic “go-fast” switch - else it would of course be used everywhere It can for example be very bad for performance when there’s lots of different tuples that need to be handled, as is the case in the compiler. You have to remember that each combination of tuple elements is its own type, and (oversimplifying) having to compile a new compiler for each kind of Core.PartialStruct whenever a new one is encountered is really bad for performance of the compiler. Keep in mind that the compile time of your function is the run time of the compiler. So having to compile in the compiler is bad for the compile time of your function.

If you share the code you encountered PartialStruct in, I can try to explain why you see it there.

Topic		Replies	Views
Core.tuple() warntype Performance	22	1481	January 17, 2020
How does Any actually work? New to Julia question	48	990	December 25, 2024
Using a AbstractUnitRange inside a struct General Usage	10	340	July 12, 2021
Julia typesystem New to Julia	22	339	October 29, 2024
Avoiding Vectors of Abstract Types Performance question , type-stability	22	4434	February 17, 2022

Any at UnitRange seems unoptimal

Related topics