Is there any package supporting multivariate distributions and basic operations on them like sampling and log pdf, but working on GPU?

# Anything like Distributions.jl for GPU?

The issue might be that Distributions.jl still uses some C libraries (Rmath, openspecfun). There was some discussion to port them in Julia but Iām not sure whatās the status of that.

Sometimes I just translate scipy code for probability distributions when I need automatic differentiation.

**andreasnoack**#4

`Distributions`

is indeed made with CPUs and `Array`

s in mind. It would be great to modify the code to work more generally even if the support on GPUs would initially be sparser than on CPUs. As mentioned, we use `Rmath`

for many of the functions in `Distributions`

but quite a few of them already have fallbacks written in Julia. However, since the number types on the GPU are the same as the CPU number types, it would be a problem to dispatch to the Julia implementation. This is a general problem.

We might want to consider a split of `Distributions`

into a package with all the basic API definitions, say `DistributionsBase`

, and then make that a dependency of `Distributions`

. In that way, it would be possible for `CuArray`

s to use (at least most of) the same API as `Distributions`

without depending on all of `Distributions`

.

**dfdx**#5

I considered modifying Distributions.jl to work on GPU, but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.

**andreasnoack**#6

ā¦but it seems easier to go the other way - start with some basic GPU-compatible implementation and gradually move to existing API of Distributions.jl.

True but it would be a shame to develop a separate API for GPU. Is would nice to be able to use e.g. `Normal(Ī¼,Ļ)`

and `Binomial(n,p)`

on the GPU.

I also think only multivariate distributions since putting scalars to GPU looks quite meaningless.

I donāt understand why you wouldnāt like to work with scalar valued distributions on the GPU, e.g. for things like

```
rand!(Normal(), dA)
```

to also work on the GPU.

**dfdx**#7

I donāt understand why you wouldnāt like to work with scalar valued distributions on the GPU, e.g. for things like [ā¦]

Ah, I thought more about generating separate numbers (e.g. something like `rand(Normal())`

) when I said about univariate distributions, generating a vector of such numbers on GPU makes perfect sense to me.

But in the case of `rand!(Normal(), dA)`

it seems like we donāt have a problem with dispatching since we can always dispatch on type of `dA`

. Or am I missing something?

**andreasnoack**#8

I think the specific example might be fine but I suspect that something like

```
pdf.(Normal(), dA)
```

could be a problem because the broadcast function would end up calling e.g. `pdf(Normal(), dA[i])`

and the elements would be, say `Float32`

s, so `pdf`

would dispatch to the wrong method.

**dfdx**#9

Indexing of individual elements in `CuArrays`

is inefficient and disabled by default if I remember correctly, so dispatching is the least of the problems I would say. On the other hand, in Julia 1.0 broadcasting is relatively easy to override.

**andreasnoack**#10

I was probably too brief. The call to `pdf(Normal(), dA[i])`

would happen on the GPU device where indexing is, of course, not disabled. This is already what CuArray broadcasting does so the problem can be illustrated with

```
julia> cdf.(Normal(), cu(rand(3)))
ERROR: InvalidIRError: compiling #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tupl
e{Base.OneTo{Int64}},typeof(cdf),Tuple{CuArrays.CuRefValue{Normal{Float64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},T
uple{Bool},Tuple{Int64}}}}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
[1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
[2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
[3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
[4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
[5] _broadcast_getindex_evalf at broadcast.jl:574
[6] _broadcast_getindex at broadcast.jl:547
[7] getindex at broadcast.jl:507
[8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
[9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Reason: unsupported call to the Julia runtime (call to jl_type_error_rt)
Stacktrace:
[1] erfc at /home/andreasnoack/.julia/packages/SpecialFunctions/fvheQ/src/erf.jl:8
[2] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:15
[3] normcdf at /home/andreasnoack/.julia/packages/StatsFuns/0W2sM/src/distrs/norm.jl:16
[4] cdf at /home/andreasnoack/.julia/packages/Distributions/WHjOk/src/univariates.jl:528
[5] _broadcast_getindex_evalf at broadcast.jl:574
[6] _broadcast_getindex at broadcast.jl:547
[7] getindex at broadcast.jl:507
[8] macro expansion at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:69
[9] #25 at /home/andreasnoack/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
```

**dfdx**#11

It took me some time to learn internals of Distributions.jl, itās dependencies and theory behind them. I tried to imagine what it would take to make a single distribution - namely, `MvNormal`

- working on CuArrays (or any other form of GPU arrays). By āworkingā I mean:

- parameters are represented as CuArrays or wrappers (e.g.
`Diagonal{T, CuArray{T},1}`

) -
`logpdf`

or any similar method is implemented

The most acute problem that Iāve encountered is that under the hood Distributions.jl heavily uses LinearAlgebra types such as `UpperTriangular`

matrix and `Cholesky`

factorization. These types, however, use concrete `Array`

instead of `AbstractArray`

. I didnāt dive deeper, but I believe using concrete types there is intended and will be hard to change in Base. Also, Cholesky factorization for CuArrays is better to do using cuSOLVER.

Unless I miss some important detail, this means we will have to re-implement half of the LinearAlgebra types, which sounds like a terrible idea. Iāll continue thinking about it, fresh idea are welcome too.

**dfdx**#12

It seems like I misinterpreted what I saw in REPL while inspecting types in LinearAlgebra - at least Cholesky factorization turns to work perfectly well:

```
A = cu(rand(3,3))
A = A*A' # make positive-definite
cholesky(A)
```