Accelerate Non-linear function evaluation

Perhaps of interest, I have a package which is a convenient way to use both LoopVectorization and threads:

julia> using Tullio, LoopVectorization

julia> function ML_tullio(x,k)
           @tullio F[i] := x[i]*x[j]/(1+x[i]*x[j]) * (i!=j) # sum over j
           @tullio F[i] += -k[i]
       end

julia> ML_tullio(x,k) ≈ ML_baseline(x,k)
true

julia> @btime ML_tullio($x, $k); # threads + avx
  8.328 ms (1178 allocations: 122.70 KiB)

julia> @btime ML_avx($x, $k); # just @avx, above
  43.627 ms (2 allocations: 78.20 KiB)

julia> @btime ML_threaded_bounds_noif($x, $k); # just threads, above
  15.554 ms (65 allocations: 86.27 KiB)

julia> @btime ML_baseline($x, $k);
  196.572 ms (2 allocations: 78.20 KiB)

(This should work on the GPU too, but may not be quicker at this size.)

6 Likes