Performance of findmax vs. raw loop

Note that starting Julia with --math-mode=fast is playing with fire:

> julia -O3 --math-mode=ieee -E "sinpi(0.15)"
0.45399049973954675
> julia -O3 --math-mode=fast -E "sinpi(0.15)"
0.0

From the definition of f6:

  for i in eachindex(p)
    v = abs(det([p[i] d0]) + det0)

    if v > vmax
      vmax = abs(det([p[i] d0]) + det0)
      imax = i
    end
  end

You didn’t actually reduce how often you’re calling det.

If you want to use LLVM.jl, you can. But it helps in situations like when you’re writing to one array and loading from another, it can tell the compiler that the write isn’t changing the load. That could allow it to reorder these reads and writes, and (for example) take advantage SIMD.

I would be very surprised if it helps in your example, however.
The array allocations are not inlined:

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3)
        .text
        push    rax
        movabs  rdi, offset jl_system_image_data
        movabs  rax, offset jl_alloc_array_1d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3, 4)
        .text
        push    rax
        movabs  rdi, offset jl_system_image_data
        movabs  rax, offset jl_alloc_array_2d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3, 4, 5)
        .text
        push    rax
        movabs  rdi, 139781825231120
        movabs  rax, offset jl_alloc_array_3d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

So what is actually going on is opaque to the compiler.

This may change some day. Keno said: