Hi @kristoffer.carlsson, thanks for your suggestion!

Here are the complete codes you can run. (The creation of the matrices is ugly, but it works. Also I’m **not** timing the creation part. I only care about the **for loop** part inside the testFun function.)

```
# The creation of sparse matrices. You can ignore this ugly part.
numGrids = 799;
q = rand(numGrids);
rows1 = zeros(Int64,numGrids*numGrids);
cols1 = zeros(Int64,numGrids*numGrids);
vals1 = zeros(numGrids*numGrids);
rows2 = zeros(Int64,numGrids*numGrids);
cols2 = zeros(Int64,numGrids*numGrids);
vals2 = zeros(numGrids*numGrids);
rows3 = zeros(Int64,numGrids*numGrids*numGrids);
cols3 = zeros(Int64,numGrids*numGrids*numGrids);
vals3 = zeros(numGrids*numGrids*numGrids);
count1 = 0;
count2 = 0;
count3 = 0;
for i = 1:numGrids
for j = 1:numGrids
r = rand();
if (r > 0.965)
count1 += 1;
rows1[count1] = i;
cols1[count1] = j;
vals1[count1] = r;
end
r = rand();
if (r > 0.965)
count2 += 1;
rows2[count2] = i;
cols2[count2] = j;
vals2[count2] = r;
end
end
for k = 1:(numGrids*numGrids)
r = rand();
if (r > 0.977)
count3 += 1;
rows3[count3] = i;
cols3[count3] = k;
vals3[count3] = r;
end
end
end
rows1 = rows1[1:count1];
cols1 = cols1[1:count1];
vals1 = vals1[1:count1];
rows2 = rows2[1:count2];
cols2 = cols2[1:count2];
vals2 = vals2[1:count2];
rows3 = rows3[1:count3];
cols3 = cols3[1:count3];
vals3 = vals3[1:count3];
sparseM = sparse(rows1,cols1,vals1,numGrids,numGrids);
sparseR = sparse(rows2,cols2,vals2,numGrids,numGrids);
sparseS = sparse(rows3,cols3,vals3,numGrids,numGrids*numGrids);
# I want to optimize this testFun function
function testFun(sparseM,sparseR,sparseS,q,numIters,numGrids)
for time = 1:numIters
tmp = 0.5 * 0.05 * (sparseR + reshape(q' * sparseS, numGrids, numGrids));
rhs = (sparseM + tmp) * q;
forwardQ = (sparseM - tmp) \ rhs;
q = forwardQ;
end
end
# Here is how I'm timing the codes.
@time testFun(sparseM,sparseR,sparseS,q,100,799)
```

**I find that my original vector ****q** has type 799 X 1 Array{Float64,2} and the speed of Julia is very slow. When I change q to be 799-element Array{Float64,1}, the speed is much much faster. I don’t understand why, but hope it can help future readers and hope someone can explain this.

Here are the updated timings:

For the same codes, Matlab takes **Elapsed time is 4.509614 seconds.**

Julia shows **6.867342 seconds (3.00 k allocations: 4.283 GB, 15.98% gc time)**