Do views interfere with thread synchronization?

I have this code which attempts compute a matrix-vector product using submatrices.

tasks = [];
    for th in 1:length(tbs)
        tb = tbs[th]
        push!(tasks, Threads.@spawn begin 
            mul!(tb.result, tb.kcolumns, tb.uv)
        end);
    end

This works fine when materialize the submatrices, i.e. when tb.kcolumns is K[:, colrange], but it does not work when I use views, i.e. when tb.kcolumns is view(K, :, colrange).

Do views change the thread behavior?

If they change threading behavior I’d suspect a data race. More information would help to understand that better, of course.

Alas, my attempts to construct a MWE have failed to reproduce the behavior (the threads get stuck forever).

Which sounds like a problem in itself. Should we have a look?

OK. Here is the code of the M(non)WE:

module example

using LinearAlgebra
using SparseArrays

struct ThreadBuffer{KVT}
    kcolumns::KVT
    uv::SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true} 
    result::Vector{Float64}
end

function parmul!(R, tbs)
    tasks = [];
    for th in 1:length(tbs)
        tb = tbs[th]
        push!(tasks, Threads.@spawn begin 
            mul!(tb.result, tb.kcolumns, tb.uv)
        end);
    end
    Threads.wait(tasks[1]);
    R .= tbs[1].result
    for th in 2:length(tbs)
        Threads.wait(tasks[th]);
        R .+= tbs[th].result
    end
    R
end

function parloop!(N)
    K = sprand(N, N, 0.1)
    U1 = rand(N)
    R = fill(0.0, N)
    nth = Base.Threads.nthreads()
    @info "$nth threads used"
    chunk = Int(floor(N / nth))
    threadbuffs = ThreadBuffer[];
    for th in 1:nth
        colrange = th < nth ? (chunk*(th-1)+1:chunk*(th)+1-1) : (chunk*(th-1)+1:length(U1))
        push!(threadbuffs, ThreadBuffer(view(K, :, colrange), view(U1, colrange), deepcopy(U1)));
        # push!(threadbuffs, ThreadBuffer(K[:, colrange], view(U1, colrange), deepcopy(U1)));
    end

    R .= parmul!(R, threadbuffs)
    @show norm(R - K*U1)
    true 
end

end # module
nothing

using .example; example.parloop!(10000)

Interesting. Returns immediately for me with

[ Info: 5 threads used
norm(R - K * U1) = 2.0796050396124542e-11
true

on

julia> versioninfo()
Julia Version 1.8.0-DEV.1309
Commit 89f23325aa (2022-01-13 19:48 UTC)        
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.0 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 5

Correct: That is why it is a non-working example.

1 Like

Code looks fine too me. Only thing I’m wondering: why deepcopy(U1) instead of similar(U1)?