Why is sparse * dense slower than dense * dense?

```
x = sprand(1000,1000,.1);
fx = full(x);
y = rand(1000,1000);
@time x*y;
0.925036 seconds (31.94 k allocations: 9.071 MB)
@time x*y;
0.885278 seconds (7 allocations: 7.630 MB)
@time fx*y;
0.548416 seconds (398.70 k allocations: 21.551 MB, 11.83% gc time)
@time fx*y;
0.086072 seconds (7 allocations: 7.630 MB, 59.55% gc time)
```

Reading #1831, I also tried a different sparse type

```
xa = Base.SparseArrays.CHOLMOD.Sparse(x);
@time xa*y;
0.147444 seconds (51.25 k allocations: 17.378 MB)
@time xa*y;
0.097434 seconds (5.47 k allocations: 15.518 MB, 5.12% gc time)
```

which is still not faster than dense * dense above (only for first call) but has vastly more allocations.

So, why is sparse * dense slower than dense * dense? Although I would also like to see an explanation, my primary interest is a convient way to exploit O(N^2) scaling of sparse multiplications over O(N^3) scaling for dense matrix multiplications.

To me this seems to be a highly important issue for many applications.