Reordering for memory contiguity/memory access

Thanks, that could improve things (also, I did not realize that the proper term is data locality). I found this topic which mentions some packages implementing Cluthill-McKee, but I am unsure which one of those would make it easiest to extract the permutation itself.

Also, I am wondering if instead of specifying targets in patterns, I should instead specify “sources” — ie effectively the sparsity pattern of the transpose. That would allow me to parallelize the outer loop without worrying about concurrent writes.

2 Likes