(I’m on Julia 0.6.2)

```
julia> A = sprand(100,100,0.01);
julia> B = rand(100,100);
julia> using BenchmarkTools
julia> @btime $C = $A*$B;
40.106 μs (2 allocations: 78.20 KiB)
julia> @btime $C = $B*$A; # other way around
42.666 μs (2 allocations: 78.20 KiB) # similarily fast
julia> @btime A_mul_B!($C, $A, $B); # in-place version
34.560 μs (0 allocations: 0 bytes) # faster (as expected)
julia> @btime A_mul_B!($C, $B, $A); # other way around
2.118 ms (6 allocations: 336 bytes) # much slower! (unexpected)
```

**Questions:**

- Why is the speed of
`A_mul_B!`

so asymmetric? - How can I do the multiplication
`B*A`

in-place?

My guesses:

- I guess this might be related to CSC format, is that right? I found out (@edit) that Julia dispatches
`A*B`

to`A_mul_B!`

but`B*A`

to`(*)(X::StridedMatrix{TX}, A::SparseMatrixCSC{TvA,TiA}) where {TX,TvA,TiA}`

in which it allocates:`Y = zeros(promote_type(TX,TvA), mX, A.n)`

. - Based on my guess for 1) I assume the question is basically if there is a version of the
`*`

implementation mentioned in 1) that takes a preallocated`Y`

.

Thanks for any comments!