CUDA performing scalar indexing when used along with Distributed

Paulo_Refosco · September 23, 2024, 3:28pm

Thanks a lot for the suggestions! Finally I got to a solution that works fine for my needs: using a struct for running with the “fixed params”. I made a MWE that is running fine:

@everywhere begin
    struct my_dummy_struct
        mat_one
        mat_two
    end
    
    function initialize_dummy_struct(mat_one, mat_two)
        return my_dummy_struct(mat_one, mat_two)
    end

    # Define function working with mat_one and mat_two as "fixed inputs"
    function (m::my_dummy_struct)(x)
        # Extract inputs from x
        variable1 = x[1]
        variable2 = x[2]

        a = sum(variable1 .* (m.mat_one * m.mat_two))
        b = mean(variable2 .* (m.mat_one))
        return a, b
    end
end

# Define mat_one and mat_two
mat_one = CUDA.ones(2,2)
mat_two = CuArray([2.5 3.0; 2.7 4.5])

# Initialize the struct on the main process
pmap_dummy_struct = initialize_dummy_struct(mat_one, mat_two)

# Make struct available on other workers
global my_pmap_dummy_struct = pmap_dummy_struct

variable1_vec = [1; 2; 2.5]
variable2_vec = [3; 2; 3.5]

pmap(x -> my_pmap_dummy_struct(x), zip(variable1_vec, variable2_vec))
#=
3-element Vector{Tuple{Float64, Float64}}:
 (25.4, 3.0)
 (50.8, 2.0)
 (63.5, 3.5)
=#

Many of the points you asked the answer is just that I am really bad at coding! (and it was great to receive the comments so that I learnt some things, thanks for that also!)

Topic		Replies	Views
GPU: Scalar indexing in kernel programming GPU cuda	2	288	June 5, 2023
Overcoming Slow Scalar Operations on GPU Arrays GPU performance	19	6450	January 4, 2021
Julia pmap how to write each worker into separate index using parallel computing Performance question	16	1455	November 30, 2022
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5323	January 4, 2021
Distributed: Passing views of an array for read access to workers (using pmap) General Usage question , performance , parallel , distributed , views	8	551	January 31, 2024

CUDA performing scalar indexing when used along with Distributed

Related topics