Dear all,
Yesterday I came across SymRCM.jl package, which seems to work really neatly and is easy to use, but it doesn’t seem to bring me any performance benefits. Allow me to explain. I wrote a little computational fluid dynamics program in Julia, and this is how the system matrix looks after discretization:
julia> A
26419×26419 SparseArrays.SparseMatrixCSC{Float64, Int64} with 176659 stored entries:
⣿⣿⢮⣷⣴⣵⣰⣷⣖⣎⣮⡨⣶⣰⣁⣭⣗⣖⡾⣁⣄⡐⡀⣤⣶⣀⣀⣄⢄⣈⡀⡂⡐⠂⢀⠐⣁⡀⢀⣦⢐⣁⠀⠀⠀⡄⣰⣌⢰⣠⣹⣿⣷⣧⡀
⢮⣷⣿⢟⣿⣿⣿⣷⣿⣿⣿⣿⢿⣿⣿⣿⣿⣻⣿⣟⣿⣿⣿⣿⣻⣷⣿⣻⣿⡿⡿⣛⢍⣷⣯⣿⣮⢿⠿⢿⣷⢫⡳⢏⣻⣿⡷⣣⣽⣿⣿⣿⣿⣿⡇
⢔⣿⣿⣿⣿⣿⣿⣿⣿⡿⠭⣿⣷⢿⣽⣽⢽⡿⢻⣎⣿⡻⣟⣽⣷⡿⣯⡟⢿⣷⡯⡯⣻⣯⠿⢟⡏⣠⡿⢵⡑⣼⡻⣽⣗⣮⣥⣳⣸⣾⡿⢷⣽⣟⠇
⢴⣾⢿⣿⣿⣿⣿⣿⣟⣻⢷⣿⡟⣿⡾⣝⢍⣿⣟⣙⣺⢿⣟⢯⠹⣵⣴⣟⢯⠍⡚⢻⠼⢶⣗⣏⣻⣺⡳⢴⢧⣏⣐⡥⢿⢃⢿⡯⣹⡷⡻⣶⣺⢻⠇
⡸⢽⣿⣿⣿⡿⣿⣹⢿⣷⣻⣿⡍⡞⣰⠳⣫⡳⣟⣔⠥⡽⠭⣏⣿⡚⢖⣿⣯⠪⡪⣮⣵⢽⡭⢟⠹⣟⠷⠂⡦⣲⠇⢾⣣⣳⢆⡂⠼⢭⣵⡘⣿⣿⡂
⡊⡻⣿⣿⣧⣧⣽⣷⣿⣾⣿⣿⡶⢼⣭⣧⡠⠹⠴⣇⣵⡭⢯⢞⣥⣟⠶⡗⣬⡳⣦⠫⣴⠵⠜⠏⢣⣚⢰⣄⢿⢣⢦⡵⣱⡜⠹⢳⢌⡙⡟⢟⢻⣏⠁
⢘⣻⣿⣷⣽⣟⣿⣭⣣⠭⣘⣏⣿⣿⡛⣬⠵⠼⠓⡏⣺⠜⡌⠳⠥⢻⡒⢹⠺⣴⢹⣾⢡⠶⢯⠠⢔⢿⢲⠣⠟⡨⢶⡽⡢⣠⠇⢬⡛⢳⢜⣵⣎⣸⡅
⡅⣼⣿⣿⣗⣿⣞⢯⢴⡚⠧⣿⡛⣬⢿⣷⡽⠓⢫⣙⣵⢸⠎⢵⠘⢽⠏⠺⢳⣢⡞⠣⣳⠶⣂⣞⡵⡢⣼⣃⣈⠉⠔⢒⠱⢆⡮⢊⡗⢾⢇⡫⢫⣽⡆
⢹⢽⣿⣻⣷⡷⣧⣵⢯⡺⣄⡊⣑⡇⢷⠋⡿⣯⡕⠞⡵⣴⠢⠉⢁⡑⢔⢒⣊⡿⣮⡠⠻⢴⢚⢹⡠⡅⢁⡆⢘⣇⢒⣮⣆⢛⢼⡐⣫⡋⣽⡏⠭⢿⡇
⠞⢫⣿⢿⡻⢶⣟⢹⢛⢽⠴⢧⡽⠤⣏⢲⣱⠍⣿⣿⡈⡧⠖⢑⡎⠠⢉⢷⡝⠀⢒⣘⡟⣓⣚⢦⢓⢋⡂⣭⠾⠒⢕⢀⠚⢌⣑⠱⠒⣬⣝⡽⡷⢽⡆
⢀⠹⣿⣿⣿⡻⣾⣞⣅⡧⡕⡿⣚⠞⣑⣛⢑⣯⠦⡬⢿⣷⣒⠌⢛⢓⠰⠀⠓⡺⢙⠅⢐⠘⡉⠖⣋⠙⠔⡇⡛⡂⠳⣆⢒⡾⠪⢌⠧⡹⠧⣎⢹⣕⠀
⠀⣬⣿⣿⣟⣽⡿⣝⡧⢧⣫⢗⢦⡉⢎⣅⡌⠂⢜⢁⡘⠜⢿⣷⡢⠽⢬⡙⠁⢝⡑⢠⣍⢎⠉⠇⡜⣐⢑⢖⡙⢩⣐⢕⢠⠭⡉⠄⠦⣶⣆⡇⠡⠤⠄
⠘⢻⢿⣾⣽⡿⢗⣦⣻⠻⣥⢿⣥⣃⣖⣄⢅⠰⠊⡉⢿⢐⣌⡎⢿⣷⡖⢸⠧⡠⠐⢲⡅⢌⠠⠉⠠⠾⠈⢴⢲⠁⡈⡠⠰⢠⣎⠇⠘⠻⠆⣒⠺⣿⠃
⠀⢼⣿⣻⣯⠿⣴⢿⣼⣵⢼⠧⣜⣈⣫⡁⢰⢑⢧⣔⠐⠂⣆⠳⣘⣉⠻⣦⡘⣁⡀⠈⠸⡱⠸⣡⠓⡕⠘⠙⠗⣴⣀⡒⢚⡮⠶⢦⠁⠶⡥⢇⡊⡀⠀
⡀⢱⣿⡿⢿⣷⡏⠗⡫⡛⢦⡻⢚⣦⠹⣲⣮⡼⠓⠉⣹⡠⣅⢄⠉⡣⠖⢨⣻⣾⡕⡋⠨⡎⠈⣐⡘⣾⠰⢓⠠⠊⡉⡨⠃⡡⢮⠁⠚⢓⠃⣀⣨⣨⡅
⠠⠨⣿⢫⡯⡯⣾⣈⡪⣮⡬⡛⣳⣶⠾⡉⠊⡻⣘⢰⠗⠔⠑⣈⢰⣀⡀⠈⡵⠩⣿⣿⣀⠙⢈⢡⠁⠛⠀⠠⠆⡄⡠⠠⣬⡆⡁⢡⢀⠓⠦⡬⠄⢵⠆
⠰⠈⢧⣵⡿⣾⢲⣇⣕⣟⢔⡟⢡⡖⢹⡞⢛⣆⢿⢩⣐⠐⡣⢝⡁⢍⢖⡢⡢⠦⣄⠘⡻⣮⣢⠦⠉⠰⠱⡴⡂⢡⣄⡘⠓⠃⠐⣒⡶⠨⡗⡖⠪⠉⠁
⢀⠐⣯⣿⣿⢇⡽⢽⣧⢏⡶⠅⠋⡓⣨⢼⣞⣐⠺⣜⢣⠌⠧⠄⡄⠂⠖⣢⢂⢠⠆⣐⠨⡞⡻⣮⡨⠝⡇⠅⡪⡈⠾⠒⣠⣙⠐⠃⣌⡇⠈⢺⠇⠀⡁
⠁⠸⣮⣟⠋⣩⣻⣺⣷⢦⣩⢲⣴⣕⠱⡫⠄⠮⡽⢐⣏⠘⢒⢩⣠⡆⢝⠤⣲⣬⣥⠀⢃⡀⣆⠎⠻⣦⡧⠐⡣⠤⡀⣆⠏⢄⡀⢈⡈⡗⠘⢊⡩⢝⠂
⠠⣴⣿⣇⢟⣏⢙⣎⠹⠃⠐⢶⠼⡒⠶⢻⠡⠴⡌⣬⠴⠥⢱⢔⢂⣄⣖⠀⢴⢂⠀⡀⢑⡦⠍⠍⢉⠋⣻⣾⡲⠰⠃⠐⠈⠡⠂⣲⡤⢅⠈⢯⡵⠄⠀
⠔⢰⡽⣛⣑⣬⡭⢷⢨⣫⠿⣓⡛⡡⡆⠘⠶⢴⢺⠃⠻⠨⡗⣈⠜⠒⢙⣥⡠⠂⠈⠥⠌⣈⡊⠪⠉⡎⢘⡊⢻⣶⡊⠊⣀⢀⠐⠒⠁⠀⣼⡓⠦⠶⠂
⠀⠀⡽⢎⣟⣮⠔⡼⣩⣅⢌⡷⣜⡷⢰⢁⡸⣴⠑⢑⠹⢦⢔⢜⠂⡨⢠⠸⡃⡨⠀⡊⣀⠹⢺⠃⠠⢬⢉⠀⡪⠈⡿⣯⡉⠀⢉⣉⡇⠂⠒⠅⢡⠤⠀
⠀⠤⣿⣾⡹⣽⠿⢓⢭⣺⣑⠾⠈⣪⠱⢆⣬⢙⡚⢄⣸⡴⡄⡖⠐⣂⡺⡴⠍⡠⠢⠿⠽⠀⣄⢺⠋⢅⠆⡀⠀⢘⠃⠈⠻⣦⣀⠊⠂⠤⡂⡀⠩⢵⡂
⡐⢾⠽⣫⢥⣻⡿⡷⠨⠱⢷⣂⡉⣅⡪⢋⢒⠳⢕⡘⡊⢆⠃⠌⠮⠝⠸⣇⠎⠓⠅⣈⢰⢠⠴⠀⡀⢈⢨⣠⢰⠀⡇⢰⡠⠘⠻⣦⡶⠦⢰⢂⡁⡚⠃
⠐⣲⣷⣿⣲⣾⢷⡾⡖⣇⣆⠱⢿⣈⣹⣍⡯⠺⡘⣤⣍⡣⢨⣧⣶⡀⢡⡄⢾⢀⢤⠐⡘⡋⠦⠽⢦⠬⠄⢏⠁⠀⠩⠉⠈⡄⠸⡏⠟⢅⣐⣒⣒⣂⠀
⣷⣾⣿⣿⢿⣏⢻⣮⣑⠻⣿⢍⢖⣵⡭⡱⡷⠿⣗⡽⡩⢧⠬⠽⢨⢡⠥⢏⠉⢠⡈⡧⢹⠭⣢⣀⡲⢀⡦⣄⢶⠻⠜⠄⠈⠨⠰⢒⢰⢸⠑⢄⠇⡆⠀
⠽⣿⣿⣿⣷⢿⣾⣚⣿⣿⡿⢶⣊⣹⣏⣶⣧⣇⣝⣏⢗⢶⠁⡆⣾⣦⠊⠨⡂⣺⢄⣅⡎⠂⠉⠁⣇⢎⠑⠏⢨⡇⠁⡖⢇⣆⣡⠨⠸⢸⠩⠥⠑⢄⠀
⠀⠈⠉⠉⠉⠁⠉⠁⠈⠈⠁⠀⠁⠉⠈⠉⠉⠉⠈⠉⠀⠀⠀⠁⠉⠀⠀⠀⠁⠉⠈⠁⠁⠀⠁⠈⠈⠀⠀⠀⠈⠀⠀⠀⠈⠈⠉⠀⠀⠀⠀⠀⠀⠀⠁
I made no attempt to renumber the unknowns, and the structure is … well … ugly. But here comes SymRCM, which I use in the following way:
julia> p=symrcm(A);
julia> B=A[p,p]
26419×26419 SparseArrays.SparseMatrixCSC{Float64, Int64} with 176659 stored entries:
⠻⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠙⢿⣷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠹⣿⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠈⠻⣿⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⣶⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠘⢿⡿⣯⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣷⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⣿⣿⣿⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⣿⡿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣯⢿⣷⡿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣯⣿⣿⣝⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣽⢿⣷⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⣿⣿⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣟⣿⣿⣿⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣿⣿⣧⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠻⣿⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⣦⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢿⣷⣄⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣷⡀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠁
julila> b=a[p];
The banded matrix looks really neat, SymRCM did a good job. However, if I run a few numerical test for the solution of both linear systems, this is what I get:
julia> @time A\a;
0.214344 seconds (76 allocations: 87.890 MiB, 1.04% gc time)
julia> @time A\a;
0.212042 seconds (76 allocations: 87.890 MiB, 1.05% gc time)
julia> @time B\b;
0.222544 seconds (76 allocations: 85.415 MiB, 1.02% gc time)
julia> @time B\b;
0.226223 seconds (76 allocations: 85.415 MiB, 1.00% gc time)
I tried with a bigger system too (499000x499000 matrix) and observed similar thing:
julia> @time A\a;
10.209412 seconds (74 allocations: 2.419 GiB, 0.29% gc time)
julia> @time A\a;
10.375933 seconds (78 allocations: 2.419 GiB, 0.87% gc time)
julia> @time A\a;
10.203141 seconds (74 allocations: 2.419 GiB, 0.03% gc time)
julia> @time B\b;
11.448977 seconds (78 allocations: 2.585 GiB, 0.75% gc time)
julia> @time B\b;
11.374048 seconds (74 allocations: 2.585 GiB, 0.02% gc time)
julia> @time B\b;
11.415554 seconds (76 allocations: 2.585 GiB, 0.24% gc time)
It seems that the banded system didn’t offer any advantage in terms of CPU time, it even seems fractionally slower. Do you guys have an idea what is going on here? Is it maybe so that linear solver being invoked with A\a
also calls algorithms to reduce the matrix bandwidth?
P.S. The matrices are stored in SparseMatrixCSC
format and I use LinearAlgebra
and SparseArrays
in addition to the SymRCM
mentioned above. Quite frankly, I don’t know which solver is invoked with \
operator. I would assume one from Krylov space family. In such a case, band width might not play such a big role, but I would still assume fewer cache misses during the solution and better performance.
Cheers