[pre-ANN] DataFlowTasks.jl: a package for inferring task dependencies based on data annotations

Dear Julia community,

Together with @ffevotte , @LaurentPlagne , and Vincent Sivadon, we have put together a package (DataFlowTasks) for automatically inferring implicit Task dependencies from explicitly annotated data dependencies.

The following example, taken from the documentation, illustrates the API:

using DataFlowTasks
A = Vector{Float64}(undef, 4)
result = let
    @dspawn fill!(@W(A), 0)           # task 1: accesses everything
    @dspawn @RW(view(A, 1:2)) .+= 2   # task 2: modifies the first half
    @dspawn @RW(view(A, 3:4)) .+= 3   # task 3: modifies the second half
    @dspawn @R(A)                     # task 4: get the result
end
fetch(result)

In the example above the last task (created with @dspawn) will wait for the second and third, which in turn will wait for the first one, because of a detected data conflict. The second and third tasks can be executed in parallel however.

More interesting examples where we believe the constant reuse of data (together with a nontrivial access pattern) makes this data flow approach more natural than say a functional approach can be found in the examples section of the docs where we showcase e.g. a pure Julia threaded cholesky factorization inpar with MKL (for large matrices)!

We are considering registering the package if others may find it useful, and would be happy to take into account feedback from the community.

9 Likes

This looks awesome! The use of macros is really clean. Would it be fair to say that this is a highly fine grained version of the Base.@atomic macro in Julia 1.8+?

Well, I kind of understand how one could see it that way, but we rather think about it in terms of an extension to Threads.@spawn, thanks to which you gain access to a way of specifying constraints on the order in which tasks are scheduled.

Consider the tiled Cholesky factorization example presented in the documentation. The core of its parallel implementation with DataFlowTasks looks like this (look at the full example to get more context if needed):

    # T[i,j] is a tile in the matrix to be factorized
    T = [view(A, tilerange(i, ts), tilerange(j, ts)) for i in 1:n, j in 1:n]

    for i in 1:n
        # Diagonal cholesky serial factorization
        @dspawn cholesky!(@RW(T[i,i]))

        # Left tiles update
        U = UpperTriangular(T[i,i])
        for j in i+1:n
            @dspawn ldiv!(@R(U)', @RW(T[i,j]))
        end

        # Submatrix update
        for j in i+1:n
            for k in j:n
                @dspawn mul!(@RW(T[j,k]), @R(T[i,j])', @R(T[i,k]), -1, 1)
            end
        end
    end

If we remove the @dspawn macros, we get a perfectly fine sequential implementation. If we replace @dspawn with Threads.@spawn, we get parallel version full of data races. Atomic operations could prevent data races, but they would not ensure that tasks are scheduled in the right order.
In our view, DataFlowTasks should be considered as a (hopefully nice) way of specifying which task operates on which data (which is simple here), so that minimal constraints on the scheduling order between tasks can be inferred automatically (which would be a bit tedious in this case, as illustrated by the complexity of the DAG of task-task dependencies)

DAG

But having this minimal set of task-task dependency constraints allows the tasks to be scheduled in a hopefully nearly optimal way, which in turn enables good speedup. To expand on the performance of the parallel tiled cholesky factorization, here is a comparison with MKL, taken from the “companion project” TiledFactorization.jl (where high sequential performance it attained thanks to LoopVectorization and TriangularSolve, and parallel performance comes from DataFlowTasks):

6 Likes