I noticed today a bottleneck in my code which was causing significant performance loss.

Given a vector b and a sparse matrix A, let b \odot A be the elementwise multiplication of b on each column of A. An example in Julia could look like this:

```
using SparseArrays
n = 3
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
b .* A
```

It seems however, that this is not natively handled very well as seen by cranking n up.

```
@time begin
n = 3
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
b .* A
end
0.000028 seconds (28 allocations: 2.375 KiB)
@time begin
n = 10000
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
b .* A
end
0.247870 seconds (51 allocations: 1.491 GiB, 25.08% gc time)
@time begin
n = 100000
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
b .* A
end
ERROR: OutOfMemoryError()
```

However, if I define the following function, which computes the same thing, I can do much better. Itâ€™s significantly faster and uses much less memory.

```
function fun(b::Vector, A::SparseMatrixCSC)
B = copy(A)
B.nzval .= B.nzval .* b[B.rowval]
return B
end
@time begin
n = 100000
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
fun(b, A)
end
0.005198 seconds (43 allocations: 11.446 MiB)
@time begin
n = Int(1e6)
b = collect(1:n)
A = sparse(collect(1:n), collect(1:n), rand(1:n))
fun(b, A)
end
0.068245 seconds (43 allocations: 114.443 MiB, 7.60% gc time)
```

Is there a reason `.*`

does not behave internally in this way? Can I overload it myself to make it behave in this way? I looked a bit at `Base.broadcast`

and I have made custom broadcasting for my own callable struct types before, in places where I could gain significant speed-ups by computing the elementwise operations differently than a naive elementwise function call. But I couldnâ€™t seem to figure out how overloading broadcasting for `*`

is supposed to be expressed.