No, I wasn’t saying this is the issue.
I was writing you can hold it against MATLAB.
And I had doubt it is the whole story (As I thought something is strange about the large memory allocation which means intermediate [Unnecessary?] data was created).
This is MATLAB code and Julia code.
The user here care about how much time it takes him to run it.
But I may be wrong and it worth going to the bottom of it as I’m curious as well and we’re here to learn.
I made this MATLAB function:
function [ runTime, mB ] = TimeExpLargeMat( matSize, numThreads )
%UNTITLED2 Summary of this function goes here
% Detailed explanation goes here
defaultNumThreads = maxNumCompThreads(numThreads);
NUM_ITER = 50;
mA = randn(matSize) + (1i * randn(matSize));
mB = zeros(matSize);
hRunTimer = tic();
for ii = 1:NUM_ITER
mB(:) = exp(mA);
end
runTime = toc(hRunTimer) / NUM_ITER;
maxNumCompThreads(defaultNumThreads);
end
I have i7 6800K with 4 Channels of Memory (Which doubles the bandwidth of regular Desktop CPU) and 6 Cores.
On my system the performance scales x3.8 when the number of thread is set to 6 (Using Hyper Threading isn’t recommended here).
So the operation of exp()
becomes Memory Bounded on my system.
Could someone try this function on a regular system (i7 x7xx) and check how it scales with the number of threads?
I’d expect it to be memory bounded faster (As the bandwidth is lower).
If it is not and it scales good with @zsoerenm CPU then it is all about the MT in MATLAB.
@zsoerenm, What your MATLAB version? Computer specification?
Your function runs on my system in 1.07 [Sec] (MATLAB Code) using 6 Threads.
It takes 3.05 [Sec] using 1 Thread which is scaling of 3 (Again on computer with good Memory Bandwidth).
Using MATLAB Profile it seems the line of the exp()
and the line of the Inner Product consume the same time (~45% each).
@zsoerenm, Could you run this:
function [ vRunTimeStat ] = TimeRun( numThreads )
NUM_ITER = 20;
defaultNumThreads = maxNumCompThreads(numThreads);
for ii = 1:NUM_ITER
hRunTimer = tic();
mA = TestFun();
vRunTime(ii) = toc(hRunTimer);
end
maxNumCompThreads(defaultNumThreads);
vRunTimeStat = zeros([4, 1]);
vRunTimeStat(1) = min(vRunTime);
vRunTimeStat(2) = median(vRunTime);
vRunTimeStat(3) = mean(vRunTime);
vRunTimeStat(4) = max(vRunTime);
end
function [ sum_signal ] = TestFun( )
range = 1:2000000;
steering_vectors = complex(randn(4,11), randn(4,11));
sum_signal = zeros(4,length(range));
for ii = 1:11
carrier_signal = exp(2i * pi * 1.023e6 * range / 4e6 + 1i * 40 * pi / 180);
steered_signal = steering_vectors(:, ii) * carrier_signal;
sum_signal = sum_signal + steered_signal;
end
end
With numThreads = 1;
and numThreads = numCores;
where numCores
is the number of cores in your computer?
On a second thought, This should have been my first post on thread.