Performance of findmax vs. raw loop

Elrod · July 19, 2020, 12:57am

Note that starting Julia with --math-mode=fast is playing with fire:

> julia -O3 --math-mode=ieee -E "sinpi(0.15)"
0.45399049973954675
> julia -O3 --math-mode=fast -E "sinpi(0.15)"
0.0

From the definition of f6:

  for i in eachindex(p)
    v = abs(det([p[i] d0]) + det0)

    if v > vmax
      vmax = abs(det([p[i] d0]) + det0)
      imax = i
    end
  end

You didn’t actually reduce how often you’re calling det.

If you want to use LLVM.jl, you can. But it helps in situations like when you’re writing to one array and loading from another, it can tell the compiler that the write isn’t changing the load. That could allow it to reorder these reads and writes, and (for example) take advantage SIMD.

I would be very surprised if it helps in your example, however.
The array allocations are not inlined:

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3)
        .text
        push    rax
        movabs  rdi, offset jl_system_image_data
        movabs  rax, offset jl_alloc_array_1d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3, 4)
        .text
        push    rax
        movabs  rdi, offset jl_system_image_data
        movabs  rax, offset jl_alloc_array_2d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

julia> @code_native debuginfo=:none syntax=:intel Array{Float64}(undef, 3, 4, 5)
        .text
        push    rax
        movabs  rdi, 139781825231120
        movabs  rax, offset jl_alloc_array_3d
        call    rax
        pop     rcx
        ret
        nop     dword ptr [rax]

So what is actually going on is opaque to the compiler.

This may change some day. Keno said:

Topic		Replies	Views
C implementation of function being ~4 times faster even absence of allocs Performance ccall	28	1398	March 5, 2025
How do I make this allocation free? New to Julia	18	943	January 22, 2023
Poor performance due to memory allocations? Performance memory-allocation	17	2855	January 15, 2019
Fast 4D argmax Performance tullio	26	2139	April 6, 2021
Memory usage problem when using findmax/min GPU	9	866	December 29, 2022

Performance of findmax vs. raw loop

Related topics