Multithreading on Cartesian products

There’s still some advantage:

julia> function f()
           v = zeros(40^5);
           map!(i -> i, v, eachindex(v))
       end
f (generic function with 1 method)

julia> function g()
           v = Array{Float64}(undef, 40^5);
           map!(i -> i, v, eachindex(v))
       end
g (generic function with 1 method)

julia> @btime f();
  206.253 ms (2 allocations: 781.25 MiB)

julia> @btime g();
  146.827 ms (2 allocations: 781.25 MiB)
1 Like

Yes indeed. I was reacting to the considerably term.

In my machine (with julia started with four threads), I get a slight speedup

julia> @btime x = testFLoops(n);
  532.133 ms (291 allocations: 781.27 MiB)

julia> @btime y = testFLoops(n,SequentialEx());
  702.121 ms (23 allocations: 781.25 MiB)

But yeah, that’s not really a nice speedup. If I extract out the allocations, I get a bit more speedup (though still not great):

julia> begin
       function testFLoops!(x, focals, ex=ThreadedEx())
           @floop ex for (i,focal) in enumerate(focals)
               x[i] = floor(rand() + Threads.threadid())
           end
           return x
       end

       n = 40
       myarr = [1:n for i in 1:5]
       numElements = prod(length(i) for i in myarr)
       x = zeros(numElements)
       focals = Iterators.product((1:length(j) for j in myarr)...)

       @btime testFLoops!($x, $focals)
       @btime testFLoops!($x, $focals, SequentialEx())
       end;
  332.903 ms (273 allocations: 21.50 KiB)
  508.605 ms (5 allocations: 240 bytes)