Is software-checked integer division actually necessary?

Continuing the discussion from zero(a::Real)?:

I’m getting out of my depth now, but it appears that a big limitation in avoiding checks on integer division is that division is handled via LLVM’s sdiv/udiv intrinsics. sdiv carries the following warning (and udiv has something similar):

Division by zero is undefined behavior. For vectors, if any element of the divisor is zero, the operation has undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by doing a 32-bit division of -2147483648 by -1.

Undefined behavior is unacceptable, so it must be avoided. Hence the extra logic.

julia> code_llvm(div, (Int32,Int32))
;  @ int.jl:295 within `div`
define i32 @julia_div_1435(i32 signext %0, i32 signext %1) #0 {
top:
  %2 = icmp ne i32 %0, -2147483648
  %3 = icmp ne i32 %1, -1
  %4 = or i1 %2, %3
  %5 = icmp ne i32 %1, 0
  %6 = and i1 %5, %4
  br i1 %6, label %pass, label %fail

fail:                                             ; preds = %top
  call void @ijl_throw({}* inttoptr (i64 140609760840144 to {}*))
  unreachable

pass:                                             ; preds = %top
  %7 = sdiv i32 %0, %1
  ret i32 %7
}

This is identical to the LLVM for code_llvm((x,y)->Core.Intrinsics.checked_sdiv_int(x,y), (Int32,Int32)) (yes, wrapping the intrinsic in an anonymous function is necessary here – go ahead and try without), which it is implemented by.

Allegedly, Core.Intrinsics.sdiv_int is the unsafe version except that somehow an error is still thrown:

julia> code_llvm((x,y)->Core.Intrinsics.sdiv_int(x,y), (Int32,Int32); debuginfo=:none)
define i32 @"julia_#21_1443"(i32 signext %0, i32 signext %1) #0 {
top:
  %2 = sdiv i32 %0, %1
  ret i32 %2
}

julia> Core.Intrinsics.sdiv_int(1,0)
ERROR: DivideError: integer division error

I don’t understand how the error is thrown unless an exception handler is already configured. I’ll admit I don’t really understand how one actually manipulates or controls exception handling (in any language, much less Julia). But it does suggest that software-checked divisions are unnecessary.

julia> using BenchmarkTools

julia> x = rand(1:100,2^12); y = rand(1:100,size(x)); z = similar(x);

julia> @btime broadcast!(div,$z,$x,$y);
  11.200 μs (0 allocations: 0 bytes)

julia> @btime broadcast!((x,y)->Core.Intrinsics.sdiv_int(x,y),$z,$x,$y);
  8.900 μs (0 allocations: 0 bytes)

julia> y[10] = 0; # insert invalid input

julia> broadcast!((x,y)->Core.Intrinsics.sdiv_int(x,y),z,x,y); # error still thrown properly
ERROR: DivideError: integer division error

Checking the assembly again, the sdiv_int version definitely does skip the error checks. I would have expected a big SIMD boost to follow with the branches eliminated, but it appears that my machine (x86 entirely?) doesn’t have SIMD integer division.

I do see a modest ~25% increase in throughput when using the unchecked intrinsic directly. It’s better with with broadcast! than map!, for reasons I haven’t investigated, although I’ve noticed that trend before.

Since the error appears to be thrown properly even without the checks, it appears it may be possible to do away with them for a small performance boost. Someone much more knowledgeable would need to sign off on something like this, however, as it seemed very dangerous before this investigation.

Can someone provide some commentary? How is the error thrown rather than me suffering the wrath of undefined behavior? An error is legal undefined behavior, but there’s no visible code that throws it.

4 Likes

You’re not testing the same thing that you’re introspecting — always a challenge.

julia> f = (x,y) -> Core.Intrinsics.sdiv_int(x,y)
#3 (generic function with 1 method)

julia> f(1,0)
0

I suspect what you’re seeing here is that the interpreted version of the intrinsic simply doesn’t have the UB “optimization” implemented. In compiled contexts you’ll get what you’re seeing from LLVM. Edit: this was wrong, see below.

2 Likes

Good catch! But I still see the same behavior:

julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

julia> uncheckeddiv(x,y) = Core.Intrinsics.sdiv_int(x,y)
uncheckeddiv (generic function with 1 method)

julia> uncheckeddiv(1,0)
ERROR: DivideError: integer division error
Stacktrace:
 [1] uncheckeddiv(x::Int64, y::Int64)
   @ Main ./REPL[156]:1
 [2] top-level scope
   @ REPL[157]:1

julia> uncheckeddiv_anonymous = (x,y) -> Core.Intrinsics.sdiv_int(x,y)
#49 (generic function with 1 method)

julia> uncheckeddiv_anonymous(1,0) # just checking that the anonymous function isn't special somehow
ERROR: DivideError: integer division error

julia> code_llvm(uncheckeddiv,(Int,Int))
;  @ REPL[156]:1 within `uncheckeddiv`
define i64 @julia_uncheckeddiv_1734(i64 signext %0, i64 signext %1) #0 {
top:
  %2 = sdiv i64 %0, %1
  ret i64 %2
}

julia> x = rand(1:100,2^12); y = rand(1:100,size(x)); z = similar(x);

julia> broadcast!(uncheckeddiv, z, x, y);

julia> y[10] = 0
0

julia> broadcast!(uncheckeddiv, z, x, y);
ERROR: DivideError: integer division error

so my question stands.

But it’s interesting that your machine returns undefined behavior while mine throws an exception.

And just verifying that there is really no error handling happening here:

julia> map!(uncheckeddiv, z, x, y);
ERROR: DivideError: integer division error
Stacktrace:
 [1] uncheckeddiv
   @ ./REPL[156]:1 [inlined]
 [2] map!(f::typeof(uncheckeddiv), dest::Vector{Int64}, A::Vector{Int64}, B::Vector{Int64})
   @ Base ./abstractarray.jl:3300
 [3] top-level scope
   @ REPL[168]:1

julia> @code_llvm debuginfo=:none map!(uncheckeddiv, z, x, y);
define nonnull {}* @"japi1_map!_1920"({}* %0, {}** noalias nocapture noundef readonly %1, i32 %2) #0 {
top:
  %3 = alloca {}**, align 8
  store volatile {}** %1, {}*** %3, align 8
  %4 = getelementptr inbounds {}*, {}** %1, i64 1
  %5 = load {}*, {}** %4, align 8
  %6 = getelementptr inbounds {}*, {}** %1, i64 2
  %7 = load {}*, {}** %6, align 8
  %8 = getelementptr inbounds {}*, {}** %1, i64 3
  %9 = load {}*, {}** %8, align 8
  %10 = bitcast {}* %5 to { i8*, i64, i16, i16, i32 }*
  %11 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %10, i64 0, i32 1
  %12 = load i64, i64* %11, align 8
  %13 = bitcast {}* %7 to { i8*, i64, i16, i16, i32 }*
  %14 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %13, i64 0, i32 1
  %15 = load i64, i64* %14, align 8
  %16 = bitcast {}* %9 to { i8*, i64, i16, i16, i32 }*
  %17 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %16, i64 0, i32 1
  %18 = load i64, i64* %17, align 8
  %.not.not = icmp eq i64 %12, 0
  br i1 %.not.not, label %L136, label %L19

L19:                                              ; preds = %top
  %.not.not71 = icmp ne i64 %15, 0
  %.not.not73 = icmp ne i64 %18, 0
  %spec.select = select i1 %.not.not71, i1 %.not.not73, i1 false
  br i1 %spec.select, label %L66.preheader, label %L136

L66.preheader:                                    ; preds = %L19
  %19 = bitcast {}* %7 to i64**
  %20 = load i64*, i64** %19, align 8
  %21 = bitcast {}* %9 to i64**
  %22 = load i64*, i64** %21, align 8
  %23 = bitcast {}* %5 to i64**
  %24 = load i64*, i64** %23, align 8
  br label %L66

L66:                                              ; preds = %L117, %L66.preheader
  %value_phi15 = phi i64 [ %32, %L117 ], [ 1, %L66.preheader ]
  %25 = add nsw i64 %value_phi15, -1
  %26 = getelementptr inbounds i64, i64* %20, i64 %25
  %27 = load i64, i64* %26, align 8
  %28 = getelementptr inbounds i64, i64* %22, i64 %25
  %29 = load i64, i64* %28, align 8
  %30 = sdiv i64 %27, %29
  %31 = getelementptr inbounds i64, i64* %24, i64 %25
  store i64 %30, i64* %31, align 8
  %.not.not68 = icmp eq i64 %value_phi15, %12
  br i1 %.not.not68, label %L136, label %L117

L117:                                             ; preds = %L66
  %32 = add nuw nsw i64 %value_phi15, 1
  %.not.not69 = icmp eq i64 %value_phi15, %15
  %.not.not70 = icmp eq i64 %value_phi15, %18
  %value_phi39 = select i1 %.not.not69, i1 true, i1 %.not.not70
  br i1 %value_phi39, label %L136, label %L66

L136:                                             ; preds = %L117, %L66, %L19, %top
  ret {}* %5
}

julia> @code_native debuginfo=:none map!(uncheckeddiv, z, x, y);
        .text
        .file   "map!"
        .globl  "japi1_map!_1842"               # -- Begin function japi1_map!_1842
        .p2align        4, 0x90
        .type   "japi1_map!_1842",@function
"japi1_map!_1842":                      # @"japi1_map!_1842"
        .cfi_startproc
# %bb.0:                                # %top
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        pushq   %r14
        pushq   %rbx
        .cfi_offset %rbx, -32
        .cfi_offset %r14, -24
        movq    %rsi, -24(%rbp)
        movq    8(%rsi), %r8
        movq    8(%r8), %rdi
        testq   %rdi, %rdi
        je      .LBB0_10
# %bb.1:                                # %L19
        movq    16(%rsi), %rax
        movq    8(%rax), %r14
        testq   %r14, %r14
        je      .LBB0_10
# %bb.2:                                # %L19
        movq    24(%rsi), %rcx
        movq    8(%rcx), %rsi
        testq   %rsi, %rsi
        je      .LBB0_10
# %bb.3:                                # %L66.preheader
        movq    (%rax), %r9
        movq    (%rcx), %r10
        movq    (%r8), %r11
        decq    %rsi
        decq    %r14
        decq    %rdi
        xorl    %ebx, %ebx
        .p2align        4, 0x90
.LBB0_4:                                # %L66
                                        # =>This Inner Loop Header: Depth=1
        movq    (%r9,%rbx,8), %rax
        movq    (%r10,%rbx,8), %rcx
        movq    %rax, %rdx
        orq     %rcx, %rdx
        shrq    $32, %rdx
        je      .LBB0_5
# %bb.6:                                #   in Loop: Header=BB0_4 Depth=1
        cqto
        idivq   %rcx
        jmp     .LBB0_7
        .p2align        4, 0x90
.LBB0_5:                                #   in Loop: Header=BB0_4 Depth=1
                                        # kill: def $eax killed $eax killed $rax
        xorl    %edx, %edx
        divl    %ecx
                                        # kill: def $eax killed $eax def $rax
.LBB0_7:                                #   in Loop: Header=BB0_4 Depth=1
        movq    %rax, (%r11,%rbx,8)
        cmpq    %rbx, %rdi
        je      .LBB0_10
# %bb.8:                                # %L117
                                        #   in Loop: Header=BB0_4 Depth=1
        cmpq    %rbx, %r14
        je      .LBB0_10
# %bb.9:                                # %L117
                                        #   in Loop: Header=BB0_4 Depth=1
        leaq    1(%rbx), %rax
        cmpq    %rbx, %rsi
        movq    %rax, %rbx
        jne     .LBB0_4
.LBB0_10:                               # %L136
        movq    %r8, %rax
        popq    %rbx
        popq    %r14
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq
.Lfunc_end0:
        .size   "japi1_map!_1842", .Lfunc_end0-"japi1_map!_1842"
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

observe no exception-y stuff or calls.

Exceptions are totally a valid thing for LLVM to do with an undefined behavior. In this case, I think the Intel divl operation itself throws the exception and Julia’s signal handler catches it:

But other architectures — like my Apple M1 — may not have such signals or they may not be as efficient.

It’s also worth noting that if LLVM happens to prove that you are dividing by zero, then it can aggressively exploit the fact that you’re in UB land and do all sorts of other strange things.

6 Likes

Just to show that this does indeed happen — now I’m on an Intel system:

julia> f(x,y) = Core.Intrinsics.sdiv_int(x, y)
f (generic function with 2 methods)

julia> f(1, 0)
ERROR: DivideError: integer division error
Stacktrace:
 [1] f(x::Int64, y::Int64)
   @ Main ./REPL[3]:1
 [2] top-level scope
   @ REPL[4]:1

julia> g(x) = f(x, 0)
g (generic function with 1 method)

julia> g(1)
140515063745504

julia> @code_llvm g(1)
;  @ REPL[5]:1 within `g`
define i64 @julia_g_937(i64 signext %0) #0 {
top:
  ret i64 poison
}

The way Julia will aggressively inline and constant-propagate through user code makes this a very real problem on all platforms, even if you didn’t write the literal 0 there!

3 Likes

Indeed. Exceptions are legal, although LLVM was content to just leave the behavior at the hardware-native result. I just didn’t know we (sometimes on some architectures) have exception handlers registered for some of these events. And I definitely didn’t suspect it would be set to return the proper Julia exception rather than something more primitive.

It’s a very good point that LLVM could have chosen to do something unsavory if it proved UB. So that leaves anything like this as a nonstarter even if a handler can do the right thing in many situations.

4 Likes