I want to evaluate using julia for a project and so far it was straightforward. The setting is the following, I have a bunch of functions (moment-generating functions over different parameters), for which I want to compute all the taylor coeffcients up to a certain order. My first try looks roughly like this:
using TaylorSeries
using LinearAlgebra
using CUDA
function moment_generating_function(u, vs)
# more complicated in reality
return simplex_volume(vs) * norm(u)
end
function simplex_volume(vs)
"""
Calculate the volume of a simplex given its vertices.
Parameters:
vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
where n is the dimension of the simplex.
Returns:
Float64: The volume of the simplex.
"""
n = size(vs, 2)
return abs(det(vs[2:end, :] .- vs[1, :])) / factorial(n)
end
function moment_generating_taylor_coefficients(vs, order)
"""
Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
"""
d = size(vs, 2)
u = set_variables("u1 u2", numvars=d, order=order)
println("Taylor series variables: ", u)
F = moment_generating_function(u, vs)
c = F.coeffs
println("Taylor series coefficient size" , size(c))
return c
end
# Main script
df = [1.0 1.0; 2.0 5.0; 3.0 2.0]
println(size(df))
order = 7
taylor_coeffs = moment_generating_taylor_coefficients(df, order)
so far, this works great! But I have up to a few thousands of these simplices, and therefore I want to compute the taylor coefficients on the gpu. In torch, I would now call vmap(lambda df: moment_generating_taylor_coefficients(df, order), axis=0)(simplices) to map the moment_generating_taylor_coefficients along axis=0. How can I achieve something similar in julia? I am a bit lost with all the packages.
I guess I also need to change the data-type of set_variables.
I looked into gpu kernels but that looked extremely low-level.
tbh, I even struggle to compute the volume of the determinant of a batch of simplices. Afaik there’s no batch-dimension in the determinant function and as far as I know the for loops are not vectorized (otherwise I could just loop here).
looking at TaylorSeries package, there is no external to CUDA or other gpu packages and nothing in docs about it, same for TaylorDiff, so, for now, I guess you can’t ? Note that you can make your code a lottt faster using
How have you tried it? I tried it like in the issue and get the same error. Maybe I need to switch to another package like AcceleratedKernels.jl you mentioned
using TaylorSeries
using LinearAlgebra
using CUDA
function moment_generating_function(u, vs)
# more complicated in reality
return simplex_volume(vs) * sum(abs2,u)
end
function simplex_volume(vs)
"""
Calculate the volume of a simplex given its vertices.
Parameters:
vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
where n is the dimension of the simplex.
Returns:
Float64: The volume of the simplex.
"""
n = size(vs, 2)
return abs(sum(@views(vs[2:end, :] .- vs[1, :]))) / factorial(n)
end
function moment_generating_taylor_coefficients(vs, order)
"""
Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
"""
d = size(vs, 2)
u = set_variables("u1 u2", numvars=d, order=order)
println("Taylor series variables: ", u)
F = moment_generating_function(u, vs)
# println("Taylor series coefficient size" , size(c))
return F
end
# Main script
df = cu([1.0 1.0; 2.0 5.0; 3.0 2.0])
println(size(df))
order = 7
taylor_coeffs = moment_generating_taylor_coefficients(df, order)
but its not what you wanted to do since I removed det and norm which broke the code. This is closer but I don’t know about the performance (note that I diff around 1)
using TaylorSeries
using LinearAlgebra
using CUDA
function moment_generating_function(u, vs)
# more complicated in reality
return simplex_volume(vs) * sqrt(sum(abs2,u))
end
function simplex_volume(vs)
"""
Calculate the volume of a simplex given its vertices.
Parameters:
vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
where n is the dimension of the simplex.
Returns:
Float64: The volume of the simplex.
"""
n = size(vs, 2)
return abs(det(@views(vs[2:end, :] .- vs[1, :]))) / factorial(n)
end
function moment_generating_taylor_coefficients(vs, order)
"""
Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
"""
d = size(vs, 2)
u = set_variables("u1 u2", numvars=d, order=order)
println("Taylor series variables: ", u)
CUDA.@allowscalar F = moment_generating_function(1 .+u, vs)
# println("Taylor series coefficient size" , size(c))
return F
end
# Main script
df = cu([1.0 1.0; 2.0 5.0; 3.0 2.0])
println(size(df))
order = 7
CUDA.@time taylor_coeffs = moment_generating_taylor_coefficients(df, order)
GPU arrays implement generic julia functions as much as possible, including the broadcasting and mapreduce machinery, reducing the need for dedicated constructs like vmap. I haven’t tried it, but in this case, my first attempt would be the following:
where simplices has shape (n + 1, n, b), with the batch dimension last such that the slices are contiguous in memory. (Julia arrays are column major, i.e., dimensions are ordered from “inner” to “outer”, opposite to python.)
If your n is fairly small (<10 maybe) then you can try a CuVector of SMatrix, using StaticArrays.jl . That’s the same memory as a size (n + 1, n, b) CuArray, but these static arrays are like scalars (or complex numbers) as far as CUDA.jl is concerned, which allows some mapslices-like things to be done.