What is the fastest way of realizing transpose of a CSR (or CSC) matrix?

Here are some data points. Let us call it unintuitive.

pkrysl@samadira sparse_transpose_tests.jl % julia --project t.jl
[ Info: Matrix M = 20133 by N = 20133, sparsity = 0.0004966969651815428, nnz = 201362
[ Info: Benchmarking copy+transpose CSC
  695.834 μs (12 allocations: 3.23 MiB)
[ Info: Benchmarking copy+transpose CSR
  1.893 s (3 allocations: 3.02 GiB)
[ Info: Benchmarking gbtranspose CSC
  823.708 μs (23 allocations: 3.69 MiB)
[ Info: Benchmarking gbtranspose CSR
  1.071 s (18 allocations: 3.02 GiB)
[ Info: Benchmarking csr_transpose
  663.541 μs (15 allocations: 3.53 MiB)
[ Info: Benchmarking csr_transpose_2
  631.500 μs (9 allocations: 3.23 MiB)
pkrysl@samadira sparse_transpose_tests.jl %

If you feel so inclined, test on your particular architecture.

Some more: Rectangular matrices.

[ Info: Matrix M = 20133 by N = 60399, sparsity = 0.0004966969651815428, nnz = 604295
[ Info: Benchmarking copy+transpose CSC
  1.987 ms (12 allocations: 9.37 MiB)
[ Info: Benchmarking copy+transpose CSR
  7.601 s (3 allocations: 9.06 GiB)
[ Info: Benchmarking gbtranspose CSC
  2.768 ms (29 allocations: 10.76 MiB)
[ Info: Benchmarking gbtranspose CSR
  3.505 s (18 allocations: 9.06 GiB)
[ Info: Benchmarking csr_transpose
  2.645 ms (15 allocations: 10.60 MiB)
[ Info: Benchmarking csr_transpose_2
  2.305 ms (9 allocations: 9.68 MiB)
pkrysl@samadira sparse_transpose_tests.jl % julia --project t2.jl
[ Info: Matrix M = 60133 by N = 20133, sparsity = 0.0001662980393461161, nnz = 201839
[ Info: Benchmarking copy+transpose CSC
  716.666 μs (12 allocations: 3.54 MiB)
[ Info: Benchmarking copy+transpose CSR
  3.836 s (3 allocations: 9.02 GiB)
[ Info: Benchmarking gbtranspose CSC
  1.047 ms (23 allocations: 4.92 MiB)
[ Info: Benchmarking gbtranspose CSR
  3.233 s (18 allocations: 9.02 GiB)
[ Info: Benchmarking csr_transpose
  784.334 μs (15 allocations: 3.54 MiB)
[ Info: Benchmarking csr_transpose_2
  768.125 μs (9 allocations: 3.23 MiB)

All of the timings on

julia> versioninfo()
Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 24 × Apple M2 Ultra
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)