Any suggestions for speeding this simple scalar function up?

You seem to have tried some avx variant, but is it there? Here @turbo makes a huge difference:

julia> using LoopVectorization

julia> function EOS1(rho;c0=100,gamma=7,rho0=1000)
           b=(c0^2*rho0)/gamma
           P = b*((rho/rho0).^gamma .- 1)
       end
EOS1 (generic function with 1 method)

julia> function EOS1_turbo(rho;c0=100,gamma=7,rho0=1000)
           b=(c0^2*rho0)/gamma
           P = @turbo b*((rho/rho0).^gamma .- 1)
       end
EOS1_turbo (generic function with 1 method)

julia> EOS1(rho) ≈ EOS1_turbo(rho)
true

julia> @btime EOS1($rho);
  68.043 μs (3 allocations: 23.81 KiB)

julia> @btime EOS1_turbo($rho);
  1.809 μs (3 allocations: 23.81 KiB)

If this will be called from within a hot loop, you probably want to preallocate b.

edit:

If you use @. to have better chances of not forgetting some loop fusion, that gets even better:

julia> function EOS1_turbo(rho;c0=100,gamma=7,rho0=1000)
           b=(c0^2*rho0)/gamma
           P = @turbo @. b*((rho/rho0)^gamma - 1)
       end
EOS1_turbo (generic function with 1 method)

julia> @btime EOS1_turbo($rho);
  701.667 ns (1 allocation: 7.94 KiB)

(the reason one would notice that that was needed is that there were 3 allocations in the previous version, where only P should be a new allocation there. Thus, some . is missing and an intermediate vector is being generated in the previous version, which should be: P = @turbo b .* ((rho ./ rho0).^gamma .- 1).

4 Likes