Here’s a quick example:

```
foo(X::Matrix{Float64}) = maximum(X)
bar(X::AbstractMatrix{Float64}) = maximum(X)
X = randn(5000, 200);
Xt = transpose(X);
@btime foo($X) # 1
600.898 μs (0 allocations: 0 bytes)
4.697355507091318
@btime bar($X) # 2
601.388 μs (0 allocations: 0 bytes)
4.697355507091318
@btime foo(collect(transpose($X))) # 3
3.335 ms (3 allocations: 7.63 MiB)
4.697355507091318
@btime bar($Xt) # 4
8.347 ms (0 allocations: 0 bytes)
4.697355507091318
```

I think I understand why the first two are much quicker than the third one (column major). However I did not expect the last one to be that slow and was expecting it to be faster/better than the 3d one (I guess it may be if your data is much larger than the one I’ve tried here).

This can arise a fair bit in ML/Stats packages where devs can have chosen either the `p x n`

or `n x p`

convention for design matrices (typically `p x n`

in JuliaStats but `n x p`

in DataFrames).

At the moment a number of these algorithms (e.g. `kmeans`

in `Clustering.jl`

) just *error* if you give them a transpose which is not ideal. That’s because they’re tied to `Matrix`

types instead of `AbstractMatrix`

. It’s an easy fix and I intend to propose PRs for this but I was not expecting it to cause a big performance difference.

Am I missing something here? is it expected that passing the `transpose`

is slower than passing `collect(transpose)`

? Thanks!