Continuing the discussion from zero(a::Real)?:
I’m getting out of my depth now, but it appears that a big limitation in avoiding checks on integer division is that division is handled via LLVM’s sdiv
/udiv
intrinsics. sdiv
carries the following warning (and udiv
has something similar):
Division by zero is undefined behavior. For vectors, if any element of the divisor is zero, the operation has undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by doing a 32-bit division of -2147483648 by -1.
Undefined behavior is unacceptable, so it must be avoided. Hence the extra logic.
julia> code_llvm(div, (Int32,Int32))
; @ int.jl:295 within `div`
define i32 @julia_div_1435(i32 signext %0, i32 signext %1) #0 {
top:
%2 = icmp ne i32 %0, -2147483648
%3 = icmp ne i32 %1, -1
%4 = or i1 %2, %3
%5 = icmp ne i32 %1, 0
%6 = and i1 %5, %4
br i1 %6, label %pass, label %fail
fail: ; preds = %top
call void @ijl_throw({}* inttoptr (i64 140609760840144 to {}*))
unreachable
pass: ; preds = %top
%7 = sdiv i32 %0, %1
ret i32 %7
}
This is identical to the LLVM for code_llvm((x,y)->Core.Intrinsics.checked_sdiv_int(x,y), (Int32,Int32))
(yes, wrapping the intrinsic in an anonymous function is necessary here – go ahead and try without), which it is implemented by.
Allegedly, Core.Intrinsics.sdiv_int
is the unsafe version except that somehow an error is still thrown:
julia> code_llvm((x,y)->Core.Intrinsics.sdiv_int(x,y), (Int32,Int32); debuginfo=:none)
define i32 @"julia_#21_1443"(i32 signext %0, i32 signext %1) #0 {
top:
%2 = sdiv i32 %0, %1
ret i32 %2
}
julia> Core.Intrinsics.sdiv_int(1,0)
ERROR: DivideError: integer division error
I don’t understand how the error is thrown unless an exception handler is already configured. I’ll admit I don’t really understand how one actually manipulates or controls exception handling (in any language, much less Julia). But it does suggest that software-checked divisions are unnecessary.
julia> using BenchmarkTools
julia> x = rand(1:100,2^12); y = rand(1:100,size(x)); z = similar(x);
julia> @btime broadcast!(div,$z,$x,$y);
11.200 μs (0 allocations: 0 bytes)
julia> @btime broadcast!((x,y)->Core.Intrinsics.sdiv_int(x,y),$z,$x,$y);
8.900 μs (0 allocations: 0 bytes)
julia> y[10] = 0; # insert invalid input
julia> broadcast!((x,y)->Core.Intrinsics.sdiv_int(x,y),z,x,y); # error still thrown properly
ERROR: DivideError: integer division error
Checking the assembly again, the sdiv_int
version definitely does skip the error checks. I would have expected a big SIMD boost to follow with the branches eliminated, but it appears that my machine (x86 entirely?) doesn’t have SIMD integer division.
I do see a modest ~25% increase in throughput when using the unchecked intrinsic directly. It’s better with with broadcast!
than map!
, for reasons I haven’t investigated, although I’ve noticed that trend before.
Since the error appears to be thrown properly even without the checks, it appears it may be possible to do away with them for a small performance boost. Someone much more knowledgeable would need to sign off on something like this, however, as it seemed very dangerous before this investigation.
Can someone provide some commentary? How is the error thrown rather than me suffering the wrath of undefined behavior? An error is legal undefined behavior, but there’s no visible code that throws it.