Anything like Distributions.jl for GPU?

Is there any package supporting multivariate distributions and basic operations on them like sampling and log pdf, but working on GPU?

1 Like

Or, what would it take to get Distributions.jl working on the GPU?

The issue might be that Distributions.jl still uses some C libraries (Rmath, openspecfun). There was some discussion to port them in Julia but Iā€™m not sure whatā€™s the status of that.

Sometimes I just translate scipy code for probability distributions when I need automatic differentiation.

Distributions is indeed made with CPUs and Arrays in mind. It would be great to modify the code to work more generally even if the support on GPUs would initially be sparser than on CPUs. As mentioned, we use Rmath for many of the functions in Distributions but quite a few of them already have fallbacks written in Julia. However, since the number types on the GPU are the same as the CPU number types, it would be a problem to dispatch to the Julia implementation. This is a general problem.

We might want to consider a split of Distributions into a package with all the basic API definitions, say DistributionsBase, and then make that a dependency of Distributions. In that way, it would be possible for CuArrays to use (at least most of) the same API as Distributions without depending on all of Distributions.

4 Likes

I considered modifying Distributions.jl to work on GPU, but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.

ā€¦but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

True but it would be a shame to develop a separate API for GPU. Is would nice to be able to use e.g. Normal(Ī¼,Ļƒ) and Binomial(n,p) on the GPU.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.

I donā€™t understand why you wouldnā€™t like to work with scalar valued distributions on the GPU, e.g. for things like

rand!(Normal(), dA)

to also work on the GPU.

1 Like

I donā€™t understand why you wouldnā€™t like to work with scalar valued distributions on the GPU, e.g. for things like [ā€¦]

Ah, I thought more about generating separate numbers (e.g. something like rand(Normal())) when I said about univariate distributions, generating a vector of such numbers on GPU makes perfect sense to me.

But in the case of rand!(Normal(), dA) it seems like we donā€™t have a problem with dispatching since we can always dispatch on type of dA. Or am I missing something?

I think the specific example might be fine but I suspect that something like

pdf.(Normal(), dA)

could be a problem because the broadcast function would end up calling e.g. pdf(Normal(), dA[i]) and the elements would be, say Float32s, so pdf would dispatch to the wrong method.

Indexing of individual elements in CuArrays is inefficient and disabled by default if I remember correctly, so dispatching is the least of the problems I would say. On the other hand, in Julia 1.0 broadcasting is relatively easy to override.

I was probably too brief. The call to pdf(Normal(), dA[i]) would happen on the GPU device where indexing is, of course, not disabled. This is already what CuArray broadcasting does so the problem can be illustrated with

julia> cdf.(Normal(), cu(rand(3)))
ERROR: InvalidIRError: compiling #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tupl
e{Base.OneTo{Int64}},typeof(cdf),Tuple{CuArrays.CuRefValue{Normal{Float64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},T
uple{Bool},Tuple{Int64}}}}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
 [2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
 [3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
 [4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
 [5] _broadcast_getindex_evalf at broadcast.jl:574
 [6] _broadcast_getindex at broadcast.jl:547
 [7] getindex at broadcast.jl:507
 [8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
 [9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Reason: unsupported call to the Julia runtime (call to jl_type_error_rt)
Stacktrace:
 [1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
 [2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
 [3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
 [4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
 [5] _broadcast_getindex_evalf at broadcast.jl:574
 [6] _broadcast_getindex at broadcast.jl:547
 [7] getindex at broadcast.jl:507
 [8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
 [9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58

It took me some time to learn internals of Distributions.jl, itā€™s dependencies and theory behind them. I tried to imagine what it would take to make a single distribution - namely, MvNormal - working on CuArrays (or any other form of GPU arrays). By ā€œworkingā€ I mean:

  • parameters are represented as CuArrays or wrappers (e.g. Diagonal{T, CuArray{T},1})
  • logpdf or any similar method is implemented

The most acute problem that Iā€™ve encountered is that under the hood Distributions.jl heavily uses LinearAlgebra types such as UpperTriangular matrix and Cholesky factorization. These types, however, use concrete Array instead of AbstractArray. I didnā€™t dive deeper, but I believe using concrete types there is intended and will be hard to change in Base. Also, Cholesky factorization for CuArrays is better to do using cuSOLVER.

Unless I miss some important detail, this means we will have to re-implement half of the LinearAlgebra types, which sounds like a terrible idea. Iā€™ll continue thinking about it, fresh idea are welcome too.

It seems like I misinterpreted what I saw in REPL while inspecting types in LinearAlgebra - at least Cholesky factorization turns to work perfectly well:

A = cu(rand(3,3))
A = A*A'   # make positive-definite 
cholesky(A)

Iā€™d argue the right place to start for this effort is not with the multivariate distributions but the univariates. The multivariate distributions are more complicated and would probably some work before theyā€™ll work for a broader range of array types.

Iā€™m not so sure about this. As I see it:

  1. You canā€™t represent a single number on GPU, the closest you can do is a single-element CuArray. Even if you could, it only makes sense as a part of processing of a larger batch of data - either broadcasting or batch operations like matrix multiplication.

  2. Broadcasting on univariate distribution type, e.g. cdf.(Normal(), cu(rand(3))), is hard since you need to pass a custom Julia struct (like Normal) to GPU which is poorly documented in C and seems to be completely missing from CUDAnative. So working towards this approach means improving CUDAnative runtime itself, which is, well, quite low-level.
    An alternative approach is to overload broadcasting and rewrite such calls into e.g. cdfnormal(mu, sigma, cu(rand(3))).

  3. Multivariate distributions, on the other hand, can operate directly on high-level CuArrays. In the simplest case, you just use CuArrays for distribution type fields (e.g. mu and sigma for MvNormal) and apply generic functions to them. Perhaps, most of the work in this case goes on re-implementing CPU-only and third-party code, which sounds to me like an easier starting point than digging into internals of CUDAnative.

Perhaps you have another vision, so Iā€™d be happy to learn about it!

Broadcasting on univariate distribution type, e.g. cdf.(Normal(), cu(rand(3))), is hard since you need to pass a custom Julia struct (like Normal) to GPU which is poorly documented in C and seems to be completely missing from CUDAnative.

I think you might be overthinking this. What Iā€™m talking about would be something like

julia> using CuArrays

julia> struct Normal{T}
         Ī¼::T
         Ļƒ::T
       end

julia> pdf(D::Normal{T}, x::T) where T = CuArrays.CUDAnative.exp(-((x-D.Ī¼)/D.Ļƒ)^2/2)/(sqrt(2*T(Ļ€))*D.Ļƒ)
pdf (generic function with 1 method)

julia> X = cu(rand(10))
10-element CuArray{Float32,1}:
 0.49907735
 0.0075366762
 0.4265667
 0.044595595
 0.6838826
 0.24200504
 0.95713586
 0.8394181
 0.8209742
 0.46985078

julia> Base.Broadcast.broadcastable(D::Normal) = Ref(D)

julia> pdf.(Normal(0.3f0, 0.9f0), X)
10-element CuArray{Float32,1}:
 0.43255657
 0.42047203
 0.4389076
 0.42577493
 0.40472618
 0.44234982
 0.3395486
 0.37039313
 0.3748919
 0.43544522
3 Likes

CUDAnative can do codegen with broadcast. You can make and return any Julia struct. You can internally use mutable structs as long as they go away. This makes pure-Julia codes ā€œjust workā€ on the GPU as long as they are fully inferred and return isbits types.

The issue that you will run into with CuArrays is that you need to use the RNG from CUDA if you want to do rand.(dist), and without a way to pass a rng that wonā€™t work. But the other functions should be fine.

1 Like