Dot-macro, in-place ops for immutables


#1

I just got a speed-up of factor ~17 on some BigInt code by using inplace operations, see https://discourse.julialang.org/t/a-plea-for-int-overflow-checking-as-the-default/3338/75?u=foobar_lv2.

Consider the line n=3*n+1, appearing in an inner loop. If n::BigInt, then this line is terrible, and one must ccall into libgmp in order to do this operation in-place and avoid the allocation.

The best way of writing this would be @. n=3*n+1 and have the macro figure out what to do. Then, one would also want to update the macro/lowering to also work for plain integers.

Is this doable? That is, (1) Make @. a NOP for immutables, so that people can write generic code that avoids allocations both for StaticArray and for Array, and (2) Teach @. to work on BigInt

Especially (1) might need breaking changes, if we also want X .= Y; to be equivalent to X = convert(typeof(X),Y); for immutable X (or alternatively be equivalent to X=Y).


#2

See https://github.com/JuliaLang/julia/issues/19992 for some prior discussions here.


#3

Hah, I missed that discussion, thanks for linking. For what it’s worth, I really like https://github.com/JuliaLang/julia/issues/26612.


#4

Even if .= were changed to assign in-place like this, it wouldn’t be sufficient to eliminate bigint temporaries from n .= 3*n+1.

There has been a fair amount of discussion of how to get faster bignum support by eliding allocations.
See e.g. https://github.com/JuliaLang/julia/issues/4176 and https://github.com/JuliaLang/julia/pull/10084 and https://github.com/JuliaLang/julia/pull/17015 on in-place BigInt and BigFloat operations.


#5

I know that n .= 3*n+1 will never work; it doesn’t even work for arrays today. However, @. n = 3*n +1 has access to the entire expression. If it had access to the types as well, possibly by placing a @generated somewhere (how?), I think something could be done using the libgmp API: walk the expression tree, figure out whether we can go without temporaries; if we do need temporaries, can we hook into the inference/optimization (before llvm touches the code)? Then we could allocate the slot for the temporary outside the loop.

But more realistically, we would expect the user to write @. n= n*3; @. n += 1;, that is, write the computation in a way that avoids as many temporaries as possible, and possibly hoist necessary temps out of the loop by hand. With the added bonus that the code will now fail for plain Int, for no good reason, forcing people to either duplicate code (bad) or having all their code spit out by macros that generate versions for plain and big ints (worse).

Having to remember the GMP API is just cruel (and even worse: you need to read the source of the MPZ julia-wrapper and the libgmp manual side-by-side).

Of course compiler optimizations that eventually make the dots unnecessary are awesome. In the meantime, an explicit way that is less pain-inducing would be pretty cool.


#6

It doesn’t. Macros never have access to types.

In fact, the whole point of the dot syntax in Julia is that it is purely syntactic sugar that doesn’t need to have access to the types. This is what allows it to be generic to arbitrary functions and array-like containers.

Eliminating temporaries in bignum operations requires something that happens much later in compilation than a macro, and probably requires compiler support or something like Cassette.jl.


#7

If m .= n.*3 could lower to m=broadcast!(*,m,n,3 ) then I could implement broadcast!(::typeof(*), dest::Integer, a::Integer, b::Integer)=a*b and broadcast!(::typeof(*), dest::BigInt, a::BigInt, b::Int64)=Base.GMP.MPZ.mul_si!(dest, a,b) for significantly nicer generic code that uses @. m= n*3 (obviously one would need a couple of others as well).

That’s still not the whole way, but better than calling MPZ by hand. In order to eliminate temporary from @. n=3*n+1 without requiring the user to split by hand into @. n=3*n; @. n = n+1;, as I said, I am not sure. If @. could somehow force a call into a @generated version of broadcast! then we could maybe go the rest of the way.

edit: Maybe @. could do this already, I’d have to look at what it actually does and whether a modification of the macro would be enough to support a more convenient non-allocating MPZ calls.


#8

For this to work you’d need a way to disable broadcast fusion for certain types, which may happen but doesn’t exist yet (https://github.com/JuliaLang/julia/issues/22060)