Hi, my algorithm involves a lot of submatrix multiply. I got inspired by the use of view, but the speedup of doing so is not that satisfying. Below is some sample code:

If we just try slicing the matrix, the speedup is pretty significant:

```
julia> A = randn(10000,10000);
julia> @time @view A[:1:5000];
0.000004 seconds (8 allocations: 320 bytes)
julia> @time A[:1:5000];
0.013570 seconds (22.51 k allocations: 1.085 MiB)
```

However, with the same dimension, if we do a submatrix multiply, I got this:

```
julia> B = randn(5000,5000);
julia> @time A[:,1:5000] = A[:, 1:5000]*B;
1.834928 seconds (14 allocations: 762.940 MiB)
julia> @time @views A[:,1:5000] = A[:, 1:5000]*B;
1.667780 seconds (16 allocations: 381.470 MiB, 1.64% gc time)
```

which is not so different. Further, if we increase the size of the matrix,

```
julia> A = randn(50000,50000);
julia> B = randn(10000,10000);
julia> @time A[:,1:10000] = A[:,1:10000]*B;
82.754603 seconds (14 allocations: 7.451 GiB, 1.46% gc time)
julia> @time @views A[:,1:10000] = A[:,1:10000]*B;
56.483068 seconds (16 allocations: 3.725 GiB, 1.44% gc time)
```

the speedup is a bit better.

I’m wondering if this speedup looks normal? In general, what level of speedup should I expect?

Any help is appreciated!