Hi,

I am trying to multiply every element of a sparse matrix A (aij) with x[i]*x[j] from vector X.

aij *= (x[i]*x[j]).

In other words, Diagonal(X)*A*Diagonal(X). It will be simple to just compute as that but it takes 2X time since it iterates A twice by 2 multiplication. Then I made a function as following to update the value of the matrix by single round of iteration.

```
function _norm!(A, x)
m, n = size(A)
rows = rowvals(A)
vals = nonzeros(A)
@inbounds @simd for i = 1:n
for j in nzrange(A, i)
vals[j] *= x[rows[j]]*x[i]
end
end
nothing
end
```

```
#testing
using SparseArrays, SharedArrays
using Distributed
n = 10^6
A = sprand(n,n,0.0001)
x = randn(Float64, n)
_norm!(A, x)
@time _norm!(A,x)
#0.591912 seconds (4 allocations: 160 bytes)
```

I try to parallel process the columns. But the function I coded is slower than single processor.

```
function _pnorm!(shared, A, x)
m, n = size(A)
rows = rowvals(A)
@sync @distributed for i = 1:n
@inbounds @simd for j in nzrange(A, i)
shared[j] *= x[rows[j]]*x[i]
end
end
nothing
end
```

```
#testing
y = SharedArray(A.nzval)
addprocs(3)
@everywhere using SparseArrays
_pnorm!(y, A, x)
@time _pnorm!(y, A, x)
#2.424550 seconds (626 allocations: 34.734 KiB)
```

This is the first time I try the parallel feature. I am very grateful for your help