Palli
September 27, 2022, 2:02pm
2
It took me a while to find this, I recalled the first package that I put at the top of the list (maybe none applies to you, but posting just in case, what I found while searching):
If you have a computation with an iteration where the time is dominated by a large sparse matrix multiplication,
julia> using LinearAlgebra, SparseArrays, BenchmarkTools
julia> n = 2^22; d = 10; A = sprand(n,n,d/n); x = rand(n);
julia> y = @btime $A*$x;
909.738 ms (2 allocations: 32.00 MiB)
julia> yt = @btime $(transpose(A))*$x;
640.637 ms (2 allocations: 32.00 MiB)
you may want to consider the CompressedSparseBlocks package, a Julia wrapper to the CSB Library .
julia> using Compressed…
ThreadedSparseCSR.jl provides a multithreaded version of CSR matrix - vector multiplication in Julia.
The CSR matrix format is implemented in the Julia package SparseMatricesCSR.jl , which must be installed for this package to work.
The package exports the functions:
tmul!(y, A, x, [alpha], [beta]), 5 argument (y = alpha*A*x +beta*y ) and 3 argument (y = A*x) in-place multithreaded versions of mul!, using Base.Threads threading (using @spawn)
tmul(A, x), multithreaded version of A*x, using Ba…
ThreadedSparseCSR.jl provides a multithreaded version of CSR matrix - vector multiplication in Julia.
The CSR matrix format is implemented in the Julia package SparseMatricesCSR.jl , which must be installed for this package to work.
The package exports the functions:
tmul!(y, A, x, [alpha], [beta]), 5 argument (y = alpha*A*x +beta*y ) and 3 argument (y = A*x) in-place multithreaded versions of mul!, using Base.Threads threading (using @spawn)
tmul(A, x), multithreaded version of A*x, using Ba…
I’ve been writing a set of kernels in which I calculate many small (mostly on the order of ~10x10) square matrices column by column, like in this toy example
using CUDA
# each slice L[:, :, j] is a matrix of interest
Ls = CUDA.rand(3, 3, 100)
f(x) = 3 * x - 2
function matrix_kernel(f, Ls)
ind = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
sz = size(Ls)
len = sz[2] * sz[3]
for i in ind : stride : len
m, n = Tuple(CartesianInd…