Calculating crossentropy with varying sized vectors on GPU

I’m attempting to calculate the cross entropy of a bunch of samples of different length.

using CUDA

entropy(π̂, π) =-sum(π .* log.(π̂ )) 

m = 3
n = 3
w = ones(m) |> gpu

a1 = rand(Float32, m, n)
a1 = softmax(a1) |> gpu

a2 = [rand(Float32, m) for i in 1:n]|>gpu

@time entropy(a1, a1)
@time entropy(a2, a2)

Calculating using equal sized vectors is no problem as they can be batched to a matrix, however for unequal sized vectors I’m left with a vector of CuArrays. How do I efficiently calculate the crossentropy on my GPU?