Dagger.jl mutable data on different workers

Salmon · November 5, 2025, 11:31am

Hello people,

I am new to Dagger and I have a beginner question.
I am planning to create a bunch (lets say 10) of instances of a problem. Each instance should be allocated on a separate worker - Each instance should fit in memory, but perhaps not all of them taken together.
These problems are then simulated, where each worker should work on on its own problem instance.
consider the following code snippet:

using Distributed
if nprocs() == 1
    addprocs(4)
end
@everywhere begin
    using Dagger

    struct My_Problem_struct
        data::Vector{Float64}
    end

    function initialize_problem(i)
        v = ones(3)
        v[begin] = i
        return My_Problem_struct(v)
    end
    
    function single_update!(A::My_Problem_struct)
        A.data[begin+1] += A.data[begin]
        A.data[end] -= A.data[begin]
        sleep(0.001) # sleeping on the job :)
    end
end

function modify_data_distr!(Problems, steps)
    for i in 1:steps
        Dagger.@sync for P in Problems
            Dagger.@spawn single_update!(P)
        end
        # do something to interact between Problems if needed
    end
    return fetch.(Problems)
end

function modify_data_single!(Problems, steps)
    fetch_Probs = fetch.(Problems)
    for i in 1:steps
        for P in fetch_Probs
            single_update!(P)
        end
    end
    return fetch_Probs
end

problem_specs = collect(1:10)
As = [
    Dagger.spawn() do
        Dagger.@mutable initialize_problem(i)
    end for i in problem_specs
]

@time modified_Problems = modify_data_parr!(As, 100)

Does this actually do the thing I want?
My feeling is the following (but I am not sure how to test it locally)

Each problem is indeed allocated on each worker individually
Dagger.@spawn will try to spawn the task to whatever worker is free. (Or it will automatically wait for a worker which has local access to the data?
If the worker is not the one where the memory was allocated, this will either (a) throw an error, or (b) automatically copy the data to the worker where it is needed. Which one is it?

I suppose the cleanest is to always make sure to spawn the tasks wherever the memory is allocated, but if there is some built-in way to copy data if needed this might be interesting too - that way the work can be scheduled more efficiently at the cost of some data transfer.

Topic		Replies	Views
Dagger error, parallel computing General Usage package	2	976	January 18, 2018
Some Dagger.jl beginner questions Julia at Scale dagger	6	746	August 23, 2023
Adding data to worker processes via @everywhere General Usage	12	1244	January 11, 2019
Make Struct available on all workers General Usage question , parallel , distributed	0	326	November 9, 2020
Adding (remote) workers to ongoing Dagger job Julia at Scale dagger	4	824	September 26, 2020

Dagger.jl mutable data on different workers

Related topics