You seem to have tried some avx
variant, but is it there? Here @turbo
makes a huge difference:
julia> using LoopVectorization
julia> function EOS1(rho;c0=100,gamma=7,rho0=1000)
b=(c0^2*rho0)/gamma
P = b*((rho/rho0).^gamma .- 1)
end
EOS1 (generic function with 1 method)
julia> function EOS1_turbo(rho;c0=100,gamma=7,rho0=1000)
b=(c0^2*rho0)/gamma
P = @turbo b*((rho/rho0).^gamma .- 1)
end
EOS1_turbo (generic function with 1 method)
julia> EOS1(rho) ≈ EOS1_turbo(rho)
true
julia> @btime EOS1($rho);
68.043 μs (3 allocations: 23.81 KiB)
julia> @btime EOS1_turbo($rho);
1.809 μs (3 allocations: 23.81 KiB)
If this will be called from within a hot loop, you probably want to preallocate b
.
edit:
If you use @.
to have better chances of not forgetting some loop fusion, that gets even better:
julia> function EOS1_turbo(rho;c0=100,gamma=7,rho0=1000)
b=(c0^2*rho0)/gamma
P = @turbo @. b*((rho/rho0)^gamma - 1)
end
EOS1_turbo (generic function with 1 method)
julia> @btime EOS1_turbo($rho);
701.667 ns (1 allocation: 7.94 KiB)
(the reason one would notice that that was needed is that there were 3 allocations in the previous version, where only P
should be a new allocation there. Thus, some .
is missing and an intermediate vector is being generated in the previous version, which should be: P = @turbo b .* ((rho ./ rho0).^gamma .- 1)
.