Oh yeah, sure. I simplified my code, to the code above. In my actual code I can not move carrier_signal outside the first loop, because it depends on i.
When I use pmap for the first loop using all available processors with julia -p7:
module Testing
function performance_test()
range = 1:2000000;
steering_vectors = complex(randn(4,11), randn(4,11));
signals = pmap(1:11) do i
carrier_signal = map(x -> exp(2im * pi * 1.023e6 * x / 4e6 + 1im * 40 * pi / 180), range);
steered_signal = steering_vectors[:,i] * carrier_signal.';
end
return sum(signals);
end
end
N = 2000000
range = 1:N
steering_vectors = complex(randn(4,11), randn(4,11))
sum_signal = zeros(Complex{Float64}, 4, length(range))
carrier_signal = zeros(Complex{Float64}, length(range))
for i = 1:11
for k = 1:N
carrier_signal[k] = exp(2im * pi * 1.023e6 * range[k] / 4e6 + 1im * 40 * pi / 180)
end
for j = 1:4
for k = 1:N
sum_signal[j,k] += steering_vectors[j,i] * carrier_signal[k]
end
end
end
return sum_signal
brings it down on my computer from 8.143687 seconds (207 allocations: 3.397 GB, 8.29% gc time)
to 3.646278 seconds (11 allocations: 152.590 MB, 0.22% gc time)
Not really tested code so hopefully it computes the same thing.
Which version of MATLAB are you using?
Did you put inside a function when testing in MATLAB?
I’d pay attention to the amount of memory consumed.
MATLAB is doing much better in that department.
@ChrisRackauckas,
Your example is exactly what shouldn’t happen.
Very clean and simple code become messy to generate some performance where it still loses to MATLAB on a loop.
I’d say, based on memory, MATLAB intelligently create a more efficient code (Less memory work).
Looking at the better memory allocation I’d say it is not all because of Multi Threading (Easy to blame but not right thing to do).
Yes, though allocating big arrays this way isn’t the problem. It’s cheap compared to the computation of exp(::Complex128)
Yes it is multithreading. As already pointed out, most of the time is spent in the exp (or sin and cos) functions and as I tested locally on MATLAB 2016b it is certainly using 4 physical cores for that.
After increasing the count to 101 instead of 11. I get 11s on matlab and 44s on julia. With julia with 4 threads, I get 26s with openlibm, and 9s with system libm. The difference between the two libm versions is a known issue https://github.com/JuliaLang/julia/issues/17395 . It’s unclear what’s triggering it…
I wasn’t saying MATLAB doesn’t use Multi Threading.
I was just saying this is MATLAB vs. Julia from a user stand of point, why would he mark this against MATLAB?
The thing is under a loop and MATLAB used to be very bad in those cases.
The low memory consumption tells me that besides the MT MATLAB created (Maybe?) more efficient code.
I’m not an expert, but if the memory is low it means less intermediate data is created, isn’t that a symptom for something?
I’d like to see the same memory consumption and performance as in the Devectorized code, or at least close to that.
Hopefully this kind of code will be (Significantly) faster in Julia 0.6.
Well, you said it wasn’t the issue but in fact it is.
It used to be bad for cheap loops with many iterations. This is nothing closed to that.
Correct, but that’s not the issue here and that’s why the more generic loop fusion syntax in 0.5 + 0.6 is for that’s already mentioned earlier in the thread.