Parallell multiplication matrix and inverse iteratively julia9

I’m new julia programmer. I want to write a program that can calculate multiple and inverse iteratively. But I can’t return result. It’s my code.

addprocs(4);
@everywhere using Distributed , 
   DistributedArrays
C=dzeros(1000000);
@time @distributed for i in 1:100 
A=rand(10000,200); 
B=rand(10000,1);
P=localpart(C);
P[1+(i-1)%Int(100/nworkers())]= 
    Vec((A×A')\B);
end

When I wanna see c as result. C is `0.
How do I fix my code?

Can you describe these two statements in a bit more detail?

Maybe you could also provide a serial code which does what you want, and then we can help you parallelise it.

I wanna calculate inv(A)×B that both of them matrices. This action must do several times on different data. I wanna use 4 process for doing it. When I use one process, everything is OK. I wanna speed up with multi processes. Result is zero. It is my problem.

First of all, you should be using B\A which will be faster and more accurate.

Also, BLAS is muti-threaded already, so you probably won’t get much of a speedup.

1 Like

everywhere using Distributed
C=;
@time @distributed for i in 1:100
V= (A×A’)\B;
append!(C,V);
end

This code is doing correctly. It is done on one process. I wanna do this action for 4 process.

Yes I use it.

Parallelising linear system solves is highly nontrivial. I strongly recommend you do this using a software package rather than writing your own code.

The expression will error. @distributed gives back a Task object, which will collect exceptions from the worker processes. Check this for the error.

One reason why your expression won’t work, is that you try to assign a vector A*A'\B to a float P[1+(i-1)...]. The latter is a single entry of the array P. Make sure to assign the former to a chunk of P of proper length.

Also, it’s vec, not Vec. But that shouldn’t be needed anyway.

By the way, which Julia version are you on? I tried the code on 1.5.3 and the \times symbol for multiplication is no longer valid. Same on 1.6.

You can get 10X speedup using your GPU.

using CUDA, BenchmarkTools 

A = rand(10000,200)
B = rand(10000,1)

gpuA = cu(A)
gpuB = cu(B)

@btime ($A*$A')\$B 
  5.324 s (8 allocations: 1.49 GiB)                                                

@btime ($gpuA*$gpuA')\$gpuB 
  555.748 ms (76 allocations: 2.80 KiB) 

Thanks.