Vectorize function for two same size CuArray

Hello! I’m rookie for Julia programming language.

I’m trying to create the loss function, which calculate cosine similarity for each row pair in x, x̂.
The simply code is in below.

using Flux, CUDA, LinearAlgebra
CUDA.allowscalar(false)

function get_model(device)
    # Some process to create model

    return model |> device
end

function cosine_similarity(x, y)
    return dot(x, y) / (norm(x) * norm(y))
end

function predict(x, model)
    return model(x)

function eval_loss(x, model)
    x̂ = predict(x)

    cos_sim = cosine_similarity.(eachrow(x), eachrow(x̂))
    cos_loss = sum(1 .- maximum(cos_sim, dims = 1))

    return cos_loss
end

function test(x, device)
    model = get_model(device)
    x = x |> device
    
    ps = Flux.params(model)

    return Flux.gradient(() -> eval_loss(x, model), ps)
end

test(x, cpu)
test(x, gpu)

Basically, it worked perfectly on cpu, but when I changed device to gpu, it appeared the CUDA scalar indexing error.

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations do not execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.

After debugging, I realized that the problem is because of the vectorize function.

cos_sim = cosine_similarity.(eachrow(x), eachrow(x̂))

I have no idea how to solve it, I had tried map, broadcast, but the problem is still existed, is there any method to solve this problem?

Many thanks!

===============================
Edited:

Split x and x̂ from matrix to vector, and rewrite cosine similarity solved my problem. Thanks a lot!

Maybe start with documentation or training/tutorial?

1 Like

eachrow and similar do not work with GPU implementations of array operations (like broadcast).

1 Like