Creating and Solving Block Sparse Matrices with PETSc

Hello! I am using Julia to write a finite volume CFD code, which requires solving large (40,000x40,000) block sparse matrices. Currently, I am using the PETSc interface provided by PETSc.jl, as its GMRES routine is notably outperforming the IterativeSolvers.jl GMRES routine (at least with my implementation of it). However, there appears to be an issue with how I am constructing the sparse block matrices. Sometimes, but not always, I encounter errors when adjusting the values of my matrix and creating a petsc solver for it.

Trimmed error message (the full one is many many lines):

[0]PETSC ERROR: #5998 PetscCommDuplicate() at /workspace/srcdir/petsc-3.16.6/src/sys/objects/tagm.c:117
[0]PETSC ERROR: #5999 PetscHeaderCreate_Private() at /workspace/srcdir/petsc-3.16.6/src/sys/objects/inherit.c:63
[0]PETSC ERROR: #6000 KSPCreate() at /workspace/srcdir/petsc-3.16.6/src/ksp/ksp/interface/itcreate.c:697
ERROR: PETSc.PetscError(1)
Stacktrace:
 [1] #174
   @ C:\Users\caleb\.julia\packages\PETSc\LSdRz\src\ksp.jl:78 [inlined]
 [2] with(f::PETSc.var"#174#183"{MPI.Comm, PETSc.KSP{Float64, PETSc.PetscLibType{Float64, Int64, String}}}, opts::PETSc.Options{PETSc.PetscLibType{Float64, Int64, String}})   
   @ PETSc C:\Users\caleb\.julia\packages\PETSc\LSdRz\src\options.jl:191
 [3] ([0]PETSC ERROR: #6001 PetscCommDestroy() at /workspace/srcdir/petsc-3.16.6/src/sys/objects/tagm.c:193
[0]PETSC ERROR: #6002 PetscHeaderDestroy_Private() at /workspace/srcdir/petsc-3.16.6/src/sys/objects/inherit.c:121
[0]PETSC ERROR: #6003 MatDestroy() at /workspace/srcdir/petsc-3.16.6/src/mat/interface/matrix.c:1307
PETSc.KSP{Float64})(comm::MPI.Comm; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:pc_type, :ksp_type, :ksp_monitor), Tuple{String, String, Bool}}})
   @ PETSc C:\Users\caleb\.julia\packages\PETSc\LSdRz\src\ksp.jl:77
 [4] PETSc.KSP(A::PETSc.MatSeqAIJ{Float64}, P::PETSc.MatSeqAIJ{Float64}; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:pc_type, :ksp_type, :ksp_monitor), Tuple{String, String, Bool}}})
   @ PETSc C:\Users\caleb\.julia\packages\PETSc\LSdRz\src\ksp.jl:268
 [5] SolveArray(A::PETSc.MatSeqAIJ{Float64}, nEq::Int64, nBlocks::Int64)
   @ Main c:\Users\caleb\Documents\Classes_Grad.5\FVM\gitlab\FVM_US2D\temp_tests\petsc_parallel.jl:31
 [6] Test()
   @ Main c:\Users\caleb\Documents\Classes_Grad.5\FVM\gitlab\FVM_US2D\temp_tests\petsc_parallel.jl:61
 [7] top-level scope
   @ c:\Users\caleb\Documents\Classes_Grad.5\FVM\gitlab\FVM_US2D\temp_tests\petsc_parallel.jl:73

I believe these errors are because PETSc is expecting a row-sequential adjustment of the values, whereas I am treating the matrix in a block-sparse fashion. I could reformulate the routines to operate in a row-sequential manner and see if the issue resolves, but I hope to parallelize the adjustment of the matrix values as the systems I am solving get larger, which is going to be an inherently non-sequential operation. Since the failings seem to be random, I am struggling to properly debug the issue and I believe it is a lack of understanding on my part of how PETSc handles the memory of matrices on the backend. A toy example following the same procedure as my CFD code is provided at the bottom of this post. I am hoping that somebody can help me understand what is causing this issue, or if there is a better approach that I could be taking for creating and solving the system. This is my first larger Julia project and my first venture into PETSc, so I appreciate any assistance people can provide! Please let me know if there is any more information that would be beneficial

Here is the code. I usually have to run it multiple times before it errors. I was initially adjusting A in parallel (hence the FLoops) and I originally thought that the threading was causing the issues I am seeing. However, the problems still occur in serial

using FLoops
using PETSc

function PopulateArray!(A, nEq, nBlocks)
# Adjusts A entries along block diagonal

    for i in 1:nBlocks
        
        # Get indices of block in array
        block_ind = ((i-1)*nEq + 1):(i*nEq)

        # Assign random values to sub-matrix
        A[block_ind, block_ind] = randn(Float64, nEq, nEq)

    end

end

function SolveArray(A, nEq, nBlocks)

    # Adjust values of A matrix
    PopulateArray!(A, nEq, nBlocks)

    # Assemble A matrix
    PETSc.assemble(A)

    # Create right-hand-side vector (Ax = rhs)
    rhs = randn(Float64, nEq*nBlocks)

    # Create a solver
    ksp = PETSc.KSP(A; pc_type="ilu", ksp_type="gmres", ksp_monitor=false)

    # Solve system
    x = ksp\rhs

    # Destroy solver to free up memory
    PETSc.destroy(ksp)

end

function Test()
# Creates a block-diagonal system of equations and solves with PETSc
# -------------------------------------------------------------------

    # Number of solver iterations to run
    nit = 1000

    # System of equation parameters
    nEq = 4 # Size of sub-matrices
    nBlocks = 100 # Number of sub-matrices

    # Initialize PETSc block-diagonal sparse array
    petsclib = PETSc.petsclibs[1]
    PETSc.initialize(petsclib)
    A = PETSc.MatSeqAIJ{Float64}(nEq*nBlocks, nEq*nBlocks, ones(Int64, nEq*nBlocks).*nEq)

    println("Starting Iterations")
    for i in 1:nit

        # Pass array to solver
        sol = SolveArray(A, nEq, nBlocks)

    end
    println("Iterations Concluded")

    # Finalize PETSc
    PETSc.destroy(A)
    PETSc.finalize(petsclib)

end


Test()

Just curious if you may be satisfied with the performance of GMRES as implemented in Krylov.jl and if the error also occurs there (on the off chance that the error is due to PETSc.jl).

1 Like

Thank you for the suggestion! I looked into KrylovKit,jl a bit, but haven’t implemented it in my full code. However, I did just implement it in the toy problem included in my post and while there is no error, the performance difference is so large that I feel I am likely implementing something incorrectly. I can’t properly benchmark PETSc due to the mentioned error but it using @time on successful runs I get the following results for solving the 400x400 block diagonal matrix 1000 times:

PETSc: ~0.25 S
KrylovKit: ~ 40 S

I would expect KrylovKit to perform slower since it doesn’t support preconditioning, but not 100x slower. At scale I also find that the manipulation of the sparse array I feed KrylovKit and IterativeSolvers is more expensive than the time needed to change the values of the PETSc matrix. The KrylovKit version of the toy code is below if you are interested:

using BenchmarkTools
using FLoops
using PETSc
using SparseArrays
using KrylovKit
using IncompleteLU

function SparseAssignment!(As, rowRange, colRange, M)
# Modifies sparse array using dense matrix ranges
# ASSUMES PROVIDED RANGES ARE FILLED ENTRIES IN PROVIDED SPARSE MATRIX
# Assigns each element per row per column

    # Loop over column range
    for j in eachindex(colRange)

        # Get indices of values in the current column
        ind_col = As.colptr[colRange[j]]:(As.colptr[colRange[j]+1]-1)

        # Get row number of each value in column
        rows = As.rowval[ind_col]

        # Loop through each row and assign values
        for i in eachindex(rowRange)

            nz_ind = ind_col[findfirst(rows.==rowRange[i])]

            As.nzval[nz_ind] = M[i,j]
        end

    end

end # Function SparseAssignment!

function PopulateArray!(A, nEq, nBlocks)
# Adjusts A entries along diagonal blocks of sparse array

    for i in 1:nBlocks
        
        # Get indices of block in array
        block_ind = ((i-1)*nEq + 1):(i*nEq)

        # Assign random values to sub-matrix
        SparseAssignment!(A, block_ind, block_ind, randn(Float64, nEq, nEq))

    end

end

function SolveArray(A, nEq, nBlocks)
    
    # Adjust values of A matrix
    PopulateArray!(A, nEq, nBlocks)

    # Predoncition system
    #LU = ilu(A; τ=0.1)

    # Create right-hand-side vector (Ax = rhs)
    rhs = randn(Float64, nEq*nBlocks)

    # Solve system
    x = linsolve(A, rhs)
    
end

function Test()
# Creates a block-diagonal system of equations and solves with KrylovKit
# -------------------------------------------------------------------

    # Number of solver iterations to run
    nit = 1000

    # System of equation parameters
    nEq = 4 # Size of sub-matrices
    nBlocks = 100 # Number of sub-matrices

    # Initialize block-diagonal sparse array
    AI = Int64[] # I indices
    AJ = Int64[] # J Indices
    # get I,J indices of each block along the diagonal
    for i in 1:nBlocks 
       rowRange = (nEq*(i-1)+1):(nEq*i)

       for ii in rowRange
           for jj in rowRange
           push!(AI, ii)
           push!(AJ, jj)
           end
       end
    end
    # Create sparse initialized with zeros
    A = sparse(AI, AJ, zeros(Float64, size(AI)))

    println("Starting Iterations")
    for i in 1:nit

        # Pass array to solver
        sol = SolveArray(A, nEq, nBlocks)

    end
    println("Iterations Concluded")

end

Test()

Upon further inspection, it appears that the heart of the issue is the call to PETSc.destroy(ksp). If I remove this call then I never see an error. However, without this command the memory allocated for the ksp is never released, completely consuming my RAM. Usually I would expect the memory would be released automatically when the program leaves the scope of the function where ksp is created, but I suppose the memory persists due to the PETSc calls actually being a wrapper for underlying C code, which has its own memory management. I’ll keep poking around and see if I can make any more progress

I was not talking about KrylovKit. I was talking about Krylov.jl. It supports preconditioning. Please let us know if you try it and feel free to open a discussion on the repo.

My apologies! I somehow managed to click the link but then install the wrong package :sweat_smile: I’ll try Krylov.jl and let you know how it goes!

You just need to replace

# Solve system
x = linsolve(A, rhs)

by

# Solve system
x, stats = gmres(A, rhs, N=LU, ldiv=true)

The documentation of GMRES is available here.

Three comments:

  • If you want multithreaded sparse matrix-vector products, just add:
using MKLSparse
  • If you have an Intel CPU, you can easily use Intel MKL BLAS with:
using MKL
  • If you reuse multiple times gmres to solve linear systems of the same size. I recommend in-place methods:
memory = 50
solver = GmresSolver(A, b, memory) 
gmres!(solver, A, rhs)

I’m curious to see the benchmarks between PETSc and Krylov.jl!

2 Likes

Using Krylov.jl preconditioned with ILUZero.jl, I get very similar performance to PETSc! Again, I can’t properly benchmark PETSc due to the memory issue, but I get approximately the following results on a system of 1000 blocks solved 1000 times. I was using MKLSparse arrays and running with 4 threads

PETSc: 1.48 S (mean)
Krylov.jl: 1.67 S (mean)

Krylov.jl may actually be faster, as in my test 50% of the time was spent adjusting the values in the sparse array used for it, as opposed to 35% in the PETSc simulation. I also saw virtually no difference between the regular and in-place versions performance-wise, but naturally saw a reduction in allocations (3.14 GiB → 2.45 GiB)

If I could figure out a better way to handle the sparse matrices I would very much like to try krylov.jl in my full code. My current approach is by finding the indices in the sparse matrix’s nzval field that correspond to a block in the sparse block and modifying those values (code below). However, this operation is very expensive over several iterations, especially when the system involves multiple blocks per row and is less intuitive than the interface for the petsc matrix (which can modified like a usual matrix)

function SparseAssignment!(As, rowRange, colRange, M)
# Modifies sparse array using dense matrix ranges
# ASSUMES PROVIDED RANGES ARE FILLED ENTRIES IN PROVIDED SPARSE MATRIX
# Assigns each element per row per column
# INPUTS:
#   As::SparseMatrixCSC - Block-sparse linear System LHS
#   rowRange::UnitRange - Row indices of block in the full As Matrix
#   colRange::UnitRange - Column indices of block in the full As Matrix
#   M::Matrix - Matrix to be assigned to bloc specified by rowRage and colRange

    # Loop over column range
    for j in eachindex(colRange)

        # Get indices of values in the current column
        ind_col = As.colptr[colRange[j]]:(As.colptr[colRange[j]+1]-1)

        # Get row number of each value in column
        rows = As.rowval[ind_col]

        # Loop through each row and assign values
        for i in eachindex(rowRange)

           # Get index of value in current row in current column
            nz_ind = ind_col[findfirst(rows.==rowRange[i])]

            As.nzval[nz_ind] = M[i,j] # Assign value from input matrix M

        end

    end

end # Function SparseAssignment!

Is there a better way to handle the reassignment of values in a sparse matrix? I could store the indices for each block ahead of time to avoid the logical in the current function, but I’d like to find a more elegant solution if possible. I know there is a block sparse matrix package from BlockSparseMatrices.jl, but it doesn’t appear to be maintained nor solve the issue of modifying the blocks. I know this is getting out of the scope of the initial question but I appreciate any insights!

2 Likes

If I understand well your problem, you have a sparse matrix with dense blocks inside.
I think that what you need is a linear operator that models the operator-vector products.
It will be easier to update the dense blocks and it will also use dense matrix-vector products for each block, which is more efficient.

You can do something like this for instance:

using LinearOperators
using Krylov

As = [(rows_1, cols_1, block_1), ..., (rows_N, cols_N, block_N)]  # use a structure that is relevant to update the blocks in your case 
m = ...  # Number of rows of As
n = ...  # Number of columns of As

function azurezaber_mul!(y::Vector{T}, As, x::Vector{T}) where T <: AbstractFloat
  y .= zero(T)
  alpha = one(T)
  beta = one(T)
  for (rows, cols, block) in As
    mul!(view(y,rows), block, view(x, cols), alpha, beta)  # dense matrix-vector product
  end
end

symmetric = hermitian = false
opA = LinearOperator(T, m, n, symmetric, hermitian, (y, x) -> azurezaber_mul!(y, As, x))

gmres(opA, b)  # You just need to replace the sparse matrix A by opA because Krylov.jl only requires operator-vector products
4 Likes

which branch of the PETSc.jl package are you using?

If it is the (default) main branch, you can try using the jek/gen or jek/dmstag branch instead.

pkg> add PETSc#jek/dmstag

Those are the branches where we use autogenerated wrappers for the PETSc library, which may solve quite a few of this issues.
Unfortunately, these branches have been pending for quite some time now, mostly due to lack of time from my side (and Jeremy moving to industry).

3 Likes

This fixed it! Loading in the #jek/dmstag branch and making a few syntax changes remedied the memory issue when destroying and recreating the solver. Thank you very much!

With the issue resolved I was also able to benchmark krylov.jl against PETSc in the block-diagonal toy problem I’ve been using throughout this thread (for those interested). Running increasing numbers of 5x5 blocks, I get the following results:

The benchmarks were run on 4 threads using MKL and MKLSparse for the krylov.jl arrays. for smaller systems krylov.jl with ILU0 preconditioning outperforms PETSc, but as the systems grow larger PETSc begins to perform better. For the largest system, PETSc averaged 10.24 seconds while Krylov.jl averaged 14.95 seconds.

@amontoison, while I really liked how clever the linear operator approach was that you provided, I was struggling to get it to scale to large systems since the linear operator was not compatible with the ILUZero preconditioner. Perhaps this is still something that can be looked at down the line! What I ended up doing was creating a view for each block’s indices in the sparse matrix’s nzval field. It was the best approach I could find and I ended up applying it to the PETSc matrix as well.

I’m going to consider this problem resolved but am happy to discuss anything further. Thank you so much for the help everybody! I learned a lot through this thread :slight_smile:

My benchmarking code:

using BenchmarkTools
using FLoops
using PETSc#jek/dmstag
using SparseArrays
using Printf
using Plots
using MKL, MKLSparse
using Krylov, ILUZero
using LinearAlgebra, LinearOperators

function CreateBlockSparse(nRows, nCols, blockSize, blockData)
# Creates square-block sparse matrix as well as vews for adjusting it by block
# blcokData = [(block1_i, block1j, block1_data), ..., (blockn_i, blockn_j, blockn_data)]
# ---------------------------------------------------------------

    # Allocations
    AI = Int64[] # I Indices
    AJ = Int64[] # J Indices
    BViews = [] # For storing block views

    vals = typeof(blockData[1][3][1])[] # Type of first element in first block

    # Loop over provided blocks to create sparse matrix
    for (BI, BJ, BM) in blockData
        
        # Get row/col ranges on current block
        rowRange = ((BI-1)*blockSize+1):(BI*blockSize)
        colRange = ((BJ-1)*blockSize+1):(BJ*blockSize)

        # Push indices and values to sparse contructor
        for i in eachindex(colRange)
        append!(AI, rowRange)
        append!(AJ, ones(Int64, blockSize)*colRange[i])
        append!(vals, BM[:,i])
        end

    end

    # Create sparse matrix
    As = sparse(AI, AJ, vals)

    # Now go back through and find the view for each block
    for (BI, BJ, BM) in blockData

        rowRange = ((BI-1)*blockSize+1):(BI*blockSize)
        colRange = ((BJ-1)*blockSize+1):(BJ*blockSize)

        # Create array for indices of block in As.nzval
        inds = Int64[]
            
        for j  in eachindex(colRange)

            # Get indices of values in the current column
            ind_col=As.colptr[colRange[j]]:(As.colptr[colRange[j]+1]-1)

            # Get row number of each value in column
            rows = As.rowval[ind_col]

            # Loop through each row and assign value
            for i in eachindex(rowRange)

                nz_ind = ind_col[findfirst(rows.==rowRange[i])]
                push!(inds, nz_ind)

            end

        end

        # push indices to view vector
        push!(BViews, @view(As.nzval[inds]))

    end


    # Returns 
    As, BViews

end

function PopulateArray!(A, BViews, nEq)
# Adjusts A entries along diagonal blocks of sparse array
# ----------------------------------------------------------

    @floop for i in eachindex(BViews)
        
        # Assign random values to sub-matrix
        BViews[i] .= vec(rand(Float64, nEq, nEq))
    end

end


function PopulateArray_PETSc!(A, BViews::Vector, nEq, nBlocks)
# Adjusts entries along block diagonal of petsc matrix
# --------------------------------------------------------------

    @floop for i in 1:nBlocks
        
        # Get indices of block in array
        block_ind = ((i-1)*nEq + 1):(i*nEq)

        # Assign random values to sub-matrix
        BViews[i] .= randn(Float64, nEq, nEq)

    end

end

function PopulateArray_PETSc!(A, BViews::Integer, nEq, nBlocks)
# For initial population of petsc matrix
# -------------------------------------------------------------

    @floop for i in 1:nBlocks
        
        # Get indices of block in array
        block_ind = ((i-1)*nEq + 1):(i*nEq)

        # Assign random values to sub-matrix
        A[block_ind, block_ind] .= randn(Float64, nEq, nEq)

    end

end

function SolveArray(A, BViews, nEq, nBlocks)
# Solves sparse array via krylov.jl
# -------------------------------------------
    
    # Adjust values of A matrix
    PopulateArray!(A, BViews, nEq)

    # Predoncition system
    P = ilu0(A)

    # Create right-hand-side vector (Ax = rhs)
    rhs = randn(Float64, nEq*nBlocks)

    # Solve system
    x = gmres(A, rhs, N=P, ldiv=true)
    
end

function SolveArray_PETSc(A, BViews, nEq, nBlocks)
# Solves sparse system via PETSc gmres
# ------------------------------------------------

    # Adjust values of A matrix
    PopulateArray_PETSc!(A, BViews, nEq, nBlocks)

    # Create right-hand-side vector (Ax = rhs)
    rhs = randn(Float64, nEq*nBlocks)

    # Assemble PETSc Matrix
    PETSc.assemble!(A)

    # Create a solver
    ksp = PETSc.KSP(A; pc_type="ilu", ksp_type="gmres", ksp_monitor=false)

    # Solve system
    x = ksp\rhs

    # Destroy solver to free up memory
    PETSc.destroy(ksp)

    # Returns 
    x

end

function Test(nEq, nBlocks, nit)
# Creates a block-diagonal system of equations and solves with KrylovKit
# -------------------------------------------------------------------

    # # Number of solver iterations to run
    # nit = 1000

    # # System of equation parameters
    # nEq = 4 # Size of sub-matrices
    # nBlocks = 10000 # Number of sub-matrices

    # Initialize block-diagonal sparse array
    As_info = []
    for i in 1:nBlocks
        push!(As_info, (i, i, zeros(Float64, nEq, nEq)))
    end
    A, BViews = CreateBlockSparse(nEq*nBlocks, nEq*nBlocks, nEq, As_info)

    # # Create GMRES solver object
    # memory = 50
    # rhs = zeros(Float64, size(AI))
    # solver = GmresSolver(A, rhs, memory)

    #println("Starting Iterations")
    for i in 1:nit

        # Pass array to solver
        sol = SolveArray(A, BViews, nEq, nBlocks)

    end
    #println("Iterations Concluded")

end

function Test_PETSc(nEq::Integer, nBlocks::Integer, nit::Integer)
# Creates a block-diagonal system of equations and solves with PETSc
# -------------------------------------------------------------------

    # # Number of solver iterations to run
    # nit = 1000

    # # System of equation parameters
    # nEq = 4 # Size of sub-matrices
    # nBlocks = 10000 # Number of sub-matrices

    # Initialize PETSc block-diagonal sparse array
    petsclib = PETSc.petsclibs[1]
    PETSc.initialize(petsclib)
    #A = PETSc.MatSeqAIJ{Float64}(nEq*nBlocks, nEq*nBlocks, ones(Int64, nEq*nBlocks).*nEq)
    A = PETSc.MatSeqAIJ(petsclib, nEq*nBlocks, nEq*nBlocks, ones(Int64, nEq*nBlocks).*nEq)
   
    PopulateArray_PETSc!(A, 0, nEq, nBlocks)
    PETSc.assemble!(A)

    # Create vector of views into PETSc array
    BViews = []
    for i in 1:nBlocks
        block_ind = ((i-1)*nEq + 1):(i*nEq)
        push!(BViews, A[block_ind, block_ind])
    end

    #println("Starting Iterations")
    for i in 1:nit

        # Pass array to solver
        sol = SolveArray_PETSc(A, BViews, nEq, nBlocks)

    end
    #println("Iterations Concluded")

    # Finalize PETSc
    PETSc.destroy(A)
    PETSc.finalize(petsclib)

end

function run_benchmarks(nEq, nBlocks, nit)
# Run benchmarks of petsc and krylov.jl

    # Allocate vectors for benchmark data
    krylov_trials = []
    petsc_trials = []

    # run benchmarks for each block number
    for nb in nBlocks

        # Run Benchmarks
        trial_krylov = @benchmark Test($nEq, $nb, $nit)
        trial_petsc = @benchmark Test_PETSc($nEq, $nb, $nit)

        # Push trial data to vectors
        push!(krylov_trials, trial_krylov)
        push!(petsc_trials, trial_petsc)

        @printf "nb=%i Benchmark Completed\n" nb

    end
    # Returns 
    krylov_trials, petsc_trials

end

function plot_benchmarks(trials, nBlocks, test_names)

    # Plot attributes
    plot_size = (1000, 900)
    marker_size = 6

    # Create plots
    run_plot = plot(xaxis=:log, yaxis=:log, legend=:topleft, size=plot_size)
    xlabel!(run_plot, "Number of Blocks")
    ylabel!(run_plot, "Runtime (seconds)")
    title!(run_plot, "Time to 100 Solves")

    mem_plot = plot(xaxis=:log, legend=:topleft, size=plot_size)
    xlabel!(mem_plot, "Number of Blocks")
    ylabel!(mem_plot, "Benchmark Allocations (MiB)")
    xticks!(run_plot, nBlocks)
    title!(mem_plot, "Memory Allocated Throughout Benchmark")

    # plot data for each trial
    for j in eachindex(trials)

        # Allocate runtime and memory allocation arrays
        runtimes = zeros(Float64, size(nBlocks,1))
        runtime_errorm = copy(runtimes)
        runtime_errorp = copy(runtimes)
        mem_allocation = copy(runtimes)

        # Get data from each benchmark for current solver
        for i in eachindex(nBlocks)

        # Get time data
        trial_times = trials[j][i].times/1.0e9 # time in seconds
        runtimes[i] = mean(trial_times)
        runtime_errorm[i] = runtimes[i] - minimum(trial_times)
        runtime_errorp[i] = maximum(trial_times) - runtimes[i] 

        mem_allocation[i] = trials[j][i].memory/(1024^2) 

        end

        println(runtimes[end])
        plot!(run_plot, nBlocks, runtimes, yerror=(runtime_errorm, runtime_errorp), seriestype=:scatter, 
              label=test_names[j], markersize=marker_size)
        plot!(mem_plot, nBlocks, mem_allocation, seriestype=:scatter, label=test_names[j], markersize=marker_size)

    end

    # Set tick marks
    xticks!(run_plot, 10 .^ (1:5))
    yticks!(run_plot, [0.01, 0.1, 1.0, 10.0])
    xticks!(mem_plot, 10 .^ (1:5))
    run_plot, mem_plot

end

nEq = 5
nBlocks = [10, 100, 500, 1000, 5000, 10000, 20000, 50000, 100000]
nit = 100
tests = ["Krylov.jl", "PETSc.jl"]

trials_krylov, trials_petsc = run_benchmarks(nEq, nBlocks, nit)
run_plot, mem_plot = plot_benchmarks((trials_krylov, trials_petsc), nBlocks, tests)
display(run_plot)
display(mem_plot)
1 Like

Thanks a lot for reporting. It shows that we should really merge the jek/dmstag branch to main.
If only I had more time…

If someone reading this has some more time at hands: there was some additional progress in this forked branch https://github.com/nicoberlie/PETSc.jl/tree/jek/dmstag

3 Likes

@boriskaus
For information, the forked branch GitHub - nicoberlie/PETSc.jl at jek/dmstag is not working.

(@v1.8) pkg> add https://github.com/nicoberlie/PETSc.jl#jek/dmstag
     Cloning git-repo `https://github.com/nicoberlie/PETSc.jl`
    Updating git-repo `https://github.com/nicoberlie/PETSc.jl`
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `~/.julia/environments/v1.8/Project.toml`
  [ace2c81b] + PETSc v0.1.3 `https://github.com/nicoberlie/PETSc.jl#jek/dmstag`
    Updating `~/.julia/environments/v1.8/Manifest.toml`
  [ace2c81b] + PETSc v0.1.3 `https://github.com/nicoberlie/PETSc.jl#jek/dmstag`
  [8fa3689e] + PETSc_jll v3.16.8+0
  [9a1356b0] + SuperLU_DIST_jll v8.0.2+0
Precompiling project...
  ✗ PETSc
  1 dependency successfully precompiled in 475 seconds. 655 already precompiled. 1 skipped during auto due to previous errors.
  1 dependency errored. To see a full report either run `import Pkg; Pkg.precompile()` or load the package

julia> using PETSc
[ Info: Precompiling PETSc [ace2c81b-2b5f-4b1e-a30d-d662738edfe0]
WARNING: method definition for _mul! at /home/alexis/.julia/packages/PETSc/JX8FT/src/matshell.jl:63 declares type variable T but does not use it.
ERROR: LoadError: UndefVarError: DMStagStencilLocation not defined