ThreadsX mapreduce performance

Does it? That’s not intentional.

The naive non-@generated version had a problem since LV already wants to know what the functions are at macro-expansion time.
So another fix to support the non-@generated version is to lift that limitation.

But it will have to know what the op is regardless to know how to correctly re-associate. Is that the problem you were hitting?

julia> function map_tturbo!(f::F, y, x) where {F}
           @tturbo for i in eachindex(y, x)
               y[i] = f(x[i])
           end
       end
map_tturbo! (generic function with 1 method)

julia> x = rand(10_000); y = similar(x);

julia> map_tturbo!(x -> log1p(x)/3, y, x);

julia> y ≈ log1p.(x) ./ 3
true

It not knowing what the function does means it won’t optimize as well. Should also perhaps make sure the function inlines.

In that case, the .instance should only be required for op. f should be free to be relatively arbitrary.