If you really do have hundreds of such internal functions that you don’t want to pass around explicitly, how can you remember the (presumably) hundreds of possible parameters for mtd
? Wouldn’t a clearer, more descriptive name be easier to recall?
I’m attempting to implement sparse matrix types that are not in SparseArrays.jl. I’m also interested in parallelizing operations, like +
, *
, permuting and slicing columns/rows, etc.
do these shorthands correspond
Interesting, you guessed almost all of them :). It seems 3 letters is descriptive enough. Just to my liking, as it is tedious to have overly long names.
mtd
= method of computation, e.g. either precompute the necessary RAM and allocate it, or allocate more RAM than necessary and then resize!
to a smaller length.
prl
= method of parallelization, e.g. 0 = no parallelization, 1 = Base (@threads, @spawn), 2 = ThreadsX, 3 = Folds, 4 = FLoops, 5 = Polyester.
srt
= sorting of entries (w.r.t. positions or weights) in each column/row.
ord
= order (permute) rows/cols w.r.t. some criterion.
chk
= check correctness (specify how much verification to carry out, 0 means no checking and computation at full speed). This is similar to @assert, which has only 2 levels (on or off).
cnv
= when and how to convert matrix to a different type.
For instance, in a product of sparse matrices, there are several ways how to approach just the non-parallel computation. Even in the official implementation SparseArrays.jl/src/linalg.jl at 4fd3aad5735e3b80eefe7b068f3407d7dd0c0924 · JuliaSparse/SparseArrays.jl · GitHub
the comment for prefer_sort
in line 553 says
Determine if sort! shall be used or the whole column be scanned based on empirical data on i7-3610QM CPU. Measuring runtimes of the scanning and sorting loops of the algorithm. The parameters 6 and 3 might be modified for different architectures.
Ideally, these hidden internal settings would be the default values of a kwarg, that could, however, be easily changed to observe how that impacts the performance. Also, I’d want the product to have the option of using mtd=1 (sorting and aggregating the entries), mtd=2 (adding entries into a dense vector and extracting nonzero ones), or mtd=3 (smart combination where at the right threshold I switch from 1 to 2, like it is in the official implementation). If I had such mtd
s, I could plot the runtime for 1 and for 2 while the density of the input matrices increases, to accurately choose the right threshold on my computer.
Regarding @assert
, I never mistook it for some performance enhancement, I was merely using it as I use assert
in Python. Could you share some MWE that demonstrates how it fails protecting me from passing an index that is out of bounds. For instance, how could this fail?
function tmp(k::Int) ::Char
@assert k in (1,2,3)
return ('a', 'b', 'c')[k] end
Regarding specialization (the main question of this post), what if I refactor like this:
using StructArrays
abstract type Mtd end
struct mtd0 <: Mtd end
struct mtd1 <: Mtd end
abstract type Prl end
struct prl0 <: Prl end
struct prl1 <: Prl end
function _f!(pp::Vector{tp}, ww::Vector{tw}, ::Type{mtd0}) ::Nothing where {tp,tw}
sort!(StructArray((pp,ww)), by=first, alg=QuickSort); return nothing end
function _f!(pp::Vector{tp}, ww::Vector{tw}, ::Type{mtd1}) ::Nothing where {tp,tw}
sort!(StructArray((pp,ww)), by=last, alg=QuickSort); return nothing end
function g!(llp::Vector{Vector{tp}}, llw::Vector{Vector{tw}}, mtd::Type{<:Mtd}, ::Type{prl0}) ::Nothing where {tp,tw}
@inbounds for j=1:length(llp) _f!(llp[j], llw[j], mtd) end end
function g!(llp::Vector{Vector{tp}}, llw::Vector{Vector{tw}}, mtd::Type{<:Mtd}, ::Type{prl1}) ::Nothing where {tp,tw}
@inbounds Threads.@threads for j=1:length(llp) _f!(llp[j], llw[j], mtd) end end
m, n, s = 10^4, 10^5, 5; # mxn sparse matrix with ≤s entries in each column
llp = [rand(1:m,rand(0:s)) for _=1:n]; # list of lists of positions
llw = [rand(-1:0.001:1,length(pp)) for pp in llp]; # list of lists of weights (values)
@time g!(llp, llw, mtd0, prl0)
@time g!(llp, llw, mtd1, prl0)
@time g!(llp, llw, mtd0, prl1)
@time g!(llp, llw, mtd1, prl1)
julia> methods(g!)
# 2 methods for generic function "g!" from Main:
[1] g!(llp::Array{Vector{tp}, 1}, llw::Array{Vector{tw}, 1}, mtd::Type{<:Mtd}, ::Type{prl1}) where {tp, tw}
@ REPL[20]:1
[2] g!(llp::Array{Vector{tp}, 1}, llw::Array{Vector{tw}, 1}, mtd::Type{<:Mtd}, ::Type{prl0}) where {tp, tw}
@ REPL[19]:1
julia> [s for m in methods(g!) for s in m.specializations if !isnothing(s)]
4-element Vector{Core.MethodInstance}:
MethodInstance for g!(::Vector{Vector{Int64}}, ::Vector{Vector{Float64}}, ::Type{mtd0}, ::Type{prl1})
MethodInstance for g!(::Vector{Vector{Int64}}, ::Vector{Vector{Float64}}, ::Type{mtd1}, ::Type{prl1})
MethodInstance for g!(::Vector{Vector{Int64}}, ::Vector{Vector{Float64}}, ::Type{mtd0}, ::Type{prl0})
MethodInstance for g!(::Vector{Vector{Int64}}, ::Vector{Vector{Float64}}, ::Type{mtd1}, ::Type{prl0})
I guess this is just what I wanted: each combination of mtd
and prl
gives its own compiled function, right?
@Sukera @Imiq If _f!
were some simple function that is performed many many times, does having arg ::Type{mtd1}
as opposed to not having it and just using 1 method, have the same performance?