Is there anything like vmap to vectorize a computation

I want to evaluate using julia for a project and so far it was straightforward. The setting is the following, I have a bunch of functions (moment-generating functions over different parameters), for which I want to compute all the taylor coeffcients up to a certain order. My first try looks roughly like this:

using TaylorSeries
using LinearAlgebra
using CUDA

function moment_generating_function(u, vs)
    # more complicated in reality
    return simplex_volume(vs) * norm(u)
end

function simplex_volume(vs)
    """
    Calculate the volume of a simplex given its vertices.

    Parameters:
    vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
                 where n is the dimension of the simplex.

    Returns:
    Float64: The volume of the simplex.
    """
    n = size(vs, 2)
    return abs(det(vs[2:end, :] .- vs[1, :])) / factorial(n)
end


function moment_generating_taylor_coefficients(vs, order)
    """
    Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
    """
    d = size(vs, 2)
    u = set_variables("u1 u2", numvars=d, order=order)
    println("Taylor series variables: ", u)
    F = moment_generating_function(u, vs)
    c = F.coeffs
    println("Taylor series coefficient size" , size(c))
    return c
end

# Main script
df = [1.0 1.0; 2.0 5.0; 3.0 2.0]
println(size(df))
order = 7
taylor_coeffs = moment_generating_taylor_coefficients(df, order)

so far, this works great! But I have up to a few thousands of these simplices, and therefore I want to compute the taylor coefficients on the gpu. In torch, I would now call vmap(lambda df: moment_generating_taylor_coefficients(df, order), axis=0)(simplices) to map the moment_generating_taylor_coefficients along axis=0. How can I achieve something similar in julia? I am a bit lost with all the packages.

I guess I also need to change the data-type of set_variables.

I looked into gpu kernels but that looked extremely low-level.

tbh, I even struggle to compute the volume of the determinant of a batch of simplices. Afaik there’s no batch-dimension in the determinant function and as far as I know the for loops are not vectorized (otherwise I could just loop here).

1 Like

looking at TaylorSeries package, there is no external to CUDA or other gpu packages and nothing in docs about it, same for TaylorDiff, so, for now, I guess you can’t ? Note that you can make your code a lottt faster using

abs(det(@view(vs[2:end, :]) .- @view(vs[1, :]))) / factorial(n)

or even better, use a loop (if you stay on cpu) or AcceleratedKernels.jl can help on gpu but not for TaylorSeries.

About your code note that norm is not compatible with TaylorSeries (F is a float), you should use sqrt( sum(abs2,u) ) and do not diff around 0.

1 Like

thanks for the tip. Actually, judging from the issues, TaylorSeries is compatible? Looks like it broke recently though: Fix from #44 to work with CUDA arrays Does Not Work in Latest Version · Issue #93 · JuliaDiff/TaylorDiff.jl · GitHub

1 Like

It works fine on gpu just tested, I have a little issue with det indexing for some reason but if you replace det with something else it works

How have you tried it? I tried it like in the issue and get the same error. Maybe I need to switch to another package like AcceleratedKernels.jl you mentioned

1 Like

this works

using TaylorSeries
using LinearAlgebra
using CUDA

function moment_generating_function(u, vs)
    # more complicated in reality
    return simplex_volume(vs) * sum(abs2,u)
end

function simplex_volume(vs)
    """
    Calculate the volume of a simplex given its vertices.

    Parameters:
    vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
                 where n is the dimension of the simplex.

    Returns:
    Float64: The volume of the simplex.
    """
    n = size(vs, 2)
    return abs(sum(@views(vs[2:end, :] .- vs[1, :]))) / factorial(n)
end


function moment_generating_taylor_coefficients(vs, order)
    """
    Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
    """
    d = size(vs, 2)
    u = set_variables("u1 u2", numvars=d, order=order)
    println("Taylor series variables: ", u)
    F = moment_generating_function(u, vs)
    # println("Taylor series coefficient size" , size(c))
    return F
end

# Main script
df = cu([1.0 1.0; 2.0 5.0; 3.0 2.0])
println(size(df))
order = 7
taylor_coeffs = moment_generating_taylor_coefficients(df, order)

but its not what you wanted to do since I removed det and norm which broke the code. This is closer but I don’t know about the performance (note that I diff around 1)

using TaylorSeries
using LinearAlgebra
using CUDA

function moment_generating_function(u, vs)
    # more complicated in reality
    return simplex_volume(vs) * sqrt(sum(abs2,u))
end

function simplex_volume(vs)
    """
    Calculate the volume of a simplex given its vertices.

    Parameters:
    vs (Matrix): A matrix of size (n+1, n) representing the vertices of the simplex,
                 where n is the dimension of the simplex.

    Returns:
    Float64: The volume of the simplex.
    """
    n = size(vs, 2)
    return abs(det(@views(vs[2:end, :] .- vs[1, :]))) / factorial(n)
end


function moment_generating_taylor_coefficients(vs, order)
    """
    Calculate the Taylor series expansion of the moment generating function of a simplex around 0.
    """
    d = size(vs, 2)
    u = set_variables("u1 u2", numvars=d, order=order)
    println("Taylor series variables: ", u)
    CUDA.@allowscalar F = moment_generating_function(1 .+u, vs)
    # println("Taylor series coefficient size" , size(c))
    return F
end

# Main script
df = cu([1.0 1.0; 2.0 5.0; 3.0 2.0])
println(size(df))
order = 7
CUDA.@time taylor_coeffs = moment_generating_taylor_coefficients(df, order)

oh wow, that’s progress! Thank you! really cool.

I guess now I just have to figure out how to go from a single simplex (of shape (n+1,n)) to a batch of simplices (b, n+1, n).

GPU arrays implement generic julia functions as much as possible, including the broadcasting and mapreduce machinery, reducing the need for dedicated constructs like vmap. I haven’t tried it, but in this case, my first attempt would be the following:

mapslices(df -> moment_generating_taylor_coefficients(df, 7), simplices; dims=3)

where simplices has shape (n + 1, n, b), with the batch dimension last such that the slices are contiguous in memory. (Julia arrays are column major, i.e., dimensions are ordered from “inner” to “outer”, opposite to python.)

according to this:

this is not really working afaik

I see, that makes sense, mapping and broadcasting can’t replace, as @maleadt said in that thread,

a tensor compiler that knows how to analyze the nested array ops and generate a single kernel

This sounds similar to what Reactant.jl aims to become. I haven’t tried it and don’t know whether it could solve your problem today.

1 Like

If your n is fairly small (<10 maybe) then you can try a CuVector of SMatrix, using StaticArrays.jl . That’s the same memory as a size (n + 1, n, b) CuArray, but these static arrays are like scalars (or complex numbers) as far as CUDA.jl is concerned, which allows some mapslices-like things to be done.