Anything like Distributions.jl for GPU?


#1

Is there any package supporting multivariate distributions and basic operations on them like sampling and log pdf, but working on GPU?


#2

Or, what would it take to get Distributions.jl working on the GPU?


#3

The issue might be that Distributions.jl still uses some C libraries (Rmath, openspecfun). There was some discussion to port them in Julia but I’m not sure what’s the status of that.

Sometimes I just translate scipy code for probability distributions when I need automatic differentiation.


#4

Distributions is indeed made with CPUs and Arrays in mind. It would be great to modify the code to work more generally even if the support on GPUs would initially be sparser than on CPUs. As mentioned, we use Rmath for many of the functions in Distributions but quite a few of them already have fallbacks written in Julia. However, since the number types on the GPU are the same as the CPU number types, it would be a problem to dispatch to the Julia implementation. This is a general problem.

We might want to consider a split of Distributions into a package with all the basic API definitions, say DistributionsBase, and then make that a dependency of Distributions. In that way, it would be possible for CuArrays to use (at least most of) the same API as Distributions without depending on all of Distributions.


#5

I considered modifying Distributions.jl to work on GPU, but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.


#6

…but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

True but it would be a shame to develop a separate API for GPU. Is would nice to be able to use e.g. Normal(μ,σ) and Binomial(n,p) on the GPU.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.

I don’t understand why you wouldn’t like to work with scalar valued distributions on the GPU, e.g. for things like

rand!(Normal(), dA)

to also work on the GPU.


#7

I don’t understand why you wouldn’t like to work with scalar valued distributions on the GPU, e.g. for things like […]

Ah, I thought more about generating separate numbers (e.g. something like rand(Normal())) when I said about univariate distributions, generating a vector of such numbers on GPU makes perfect sense to me.

But in the case of rand!(Normal(), dA) it seems like we don’t have a problem with dispatching since we can always dispatch on type of dA. Or am I missing something?


#8

I think the specific example might be fine but I suspect that something like

pdf.(Normal(), dA)

could be a problem because the broadcast function would end up calling e.g. pdf(Normal(), dA[i]) and the elements would be, say Float32s, so pdf would dispatch to the wrong method.


#9

Indexing of individual elements in CuArrays is inefficient and disabled by default if I remember correctly, so dispatching is the least of the problems I would say. On the other hand, in Julia 1.0 broadcasting is relatively easy to override.


#10

I was probably too brief. The call to pdf(Normal(), dA[i]) would happen on the GPU device where indexing is, of course, not disabled. This is already what CuArray broadcasting does so the problem can be illustrated with

julia> cdf.(Normal(), cu(rand(3)))
ERROR: InvalidIRError: compiling #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tupl
e{Base.OneTo{Int64}},typeof(cdf),Tuple{CuArrays.CuRefValue{Normal{Float64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},T
uple{Bool},Tuple{Int64}}}}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
 [2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
 [3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
 [4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
 [5] _broadcast_getindex_evalf at broadcast.jl:574
 [6] _broadcast_getindex at broadcast.jl:547
 [7] getindex at broadcast.jl:507
 [8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
 [9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Reason: unsupported call to the Julia runtime (call to jl_type_error_rt)
Stacktrace:
 [1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
 [2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
 [3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
 [4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
 [5] _broadcast_getindex_evalf at broadcast.jl:574
 [6] _broadcast_getindex at broadcast.jl:547
 [7] getindex at broadcast.jl:507
 [8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
 [9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58


#11

It took me some time to learn internals of Distributions.jl, it’s dependencies and theory behind them. I tried to imagine what it would take to make a single distribution - namely, MvNormal - working on CuArrays (or any other form of GPU arrays). By “working” I mean:

  • parameters are represented as CuArrays or wrappers (e.g. Diagonal{T, CuArray{T},1})
  • logpdf or any similar method is implemented

The most acute problem that I’ve encountered is that under the hood Distributions.jl heavily uses LinearAlgebra types such as UpperTriangular matrix and Cholesky factorization. These types, however, use concrete Array instead of AbstractArray. I didn’t dive deeper, but I believe using concrete types there is intended and will be hard to change in Base. Also, Cholesky factorization for CuArrays is better to do using cuSOLVER.

Unless I miss some important detail, this means we will have to re-implement half of the LinearAlgebra types, which sounds like a terrible idea. I’ll continue thinking about it, fresh idea are welcome too.


#12

It seems like I misinterpreted what I saw in REPL while inspecting types in LinearAlgebra - at least Cholesky factorization turns to work perfectly well:

A = cu(rand(3,3))
A = A*A'   # make positive-definite 
cholesky(A)