Is there a package for "random vectors" like rv in R?

Tamas_Papp · August 14, 2023, 11:35am

Kerman and Gelman (2007) introduced an interesting way to think about posterior simulations, implemented in rv in R. The basic idea is to implement (a vector of) posterior draws as first-class values, and overload arithmetic etc so that basic algebraic manipulations work.

Eg if a and b both contain 5000 posterior draws of a scalar, then a + b would contain the sum, matching draws. This is mainly useful for calculating with the posterior, after obvious convergence checks etc have been performed.

Is there a package that does something similar in Julia?

aplavin · August 14, 2023, 1:09pm

Sounds like MonteCarloMeasurements.jl.

Tamas_Papp · August 14, 2023, 1:34pm

To a certain extent, yes, but the API seems to be focused on univariate quantities. I also want posterior draws of vectors, NamedTuples, etc to “just work”. I am not sure what is the best API for that would be in Julia.

I admit that at this point I am hazy on the specs, and how to mesh that well with Julia. In R, operators like + automagically broadcast, but in Julia one either

defines each function to propagate uncertainty (something like what MonteCarloMeasurements.register_primitive does),
provides a new way of lifting operators (MonteCarloMeasurements.bymap),
hooks into broadcasting, but it is not clear to me is one can make both broadcasting on array dimensions and sampling uncertainty work.

Perhaps I will experiment with (3).

sethaxen · August 16, 2023, 12:41pm

It would certainly not be hard to support the same cases supported by posterior’s rvar – basically expanding MCM to also represent random arrays – but for generic Julia types this is one of these cases where in principle one wants the not-yet-possible RV{SomeType,<:AbstractArray{SomeType}} <: SomeType. Otherwise one needs to define a ton of overloads, and even then the objects probably still won’t behave as intended.

I’m always interested though in the use cases for such types. I only see such RV objects being useful for providing some syntactic sugar, but they’re very fragile when passed to arbitrary Julia code, since they fail as soon as one hits control flow. With mapslices, eachslice, etc, it’s not hard to work with multidimensional arrays of arbitrary Julia objects and make sure the right thing is always done.

Tamas_Papp · August 16, 2023, 1:03pm

I agree with most of what you say, but what I find difficult is treating a vector of vectors (matrices, arrays) as if the were slices of an array of + 1 dimensions for output. That is to say,

a = randn(5)
b = eachrow(randn(5, 4))
map((a, b) -> a .+ b, a, b) # Vector{Vector{Float64}}

In any case, I started [package since deleted] and I am currently working on array-based underlying representations for vectors of arrays. I won’t register this package until I found it useful, it is still an experiment.

EDIT: that approach did not pan out, so I deleted the repo and removed the link. I am working on another one and will announce it here.

bertschi · August 16, 2023, 1:53pm

From my understanding MonteCarloMeasurements already works on vectors, tuples etc, i.e.,

using MonteCarloMeasurements
using Distributions
using TransformVariables

trans = as((; a = asℝ₊, b = as(Vector, 2)))
ps = Particles(10, MvNormal([1.0 0.9 0.0; 0.9 1.0 0.0; 0.0 0.0 1.0]))
θ = TransformVariables.transform(trans, ps)

While it represents ps as a Vector{Particles{Float64, 10}} all samples are kept aligned, i.e., ps[1].particles[7], ps[2].particles[7] and ps[3].particles[7] together form the 7-th vector valued sample and arithmetic works as expected, i.e., [1 2; 3 4] * θ.b multiples each vector valued sample with the given matrix or θ.b * ps' gives a matrix of each 2-dimensional sample in θ.b multiplied with each of the 3-dimensional samples in ps, i.e., providing matrix-valued samples. The only thing that I found lacking is an easy interface to extract the i-th sample which currently requires something like map(v -> v.particles[7], ps).
What other use cases, e.g. where this is insufficient, do you have in mind?

Tamas_Papp · August 17, 2023, 8:35am

I prefer to work with “particles of containers” instead of “containers of particles”. Eg if a posterior is about matrices, I want something analogous to a vector of matrices, not a matrix of vectors. Maybe MonteCarloMeasurements can do this and I missed it? Then an example would help.

What @sethaxen describes above (mapslices, eachslice) is almost there, except for

compact storage of results whenever applicable (eg collect a vector of matrices as an Array{T,3}, etc)
some basic sanity checks to prevent the user from accidentally mixing results from different inference runs, eg a tag for all values read from the same posterior

I will continue to experiment and report back here.

sethaxen · August 17, 2023, 11:28am

InferenceObjects.InferenceData, used in ArviZ, takes this approach. But it actually takes it further in not (currently) supporting draws of arbitrary Julia types, since for diagnostics, summary statistics, plotting, and long-term serialization, one generally needs to flatten Julia types into numeric arrays or tables anyways. There are plans to support interconverting between these compact representations and the original Julia type, but it’s tricky.

julia> using ArviZExampleData, LinearAlgebra, Statistics

julia> idata = load_example_data("centered_eight")
InferenceData with groups:
  > posterior
  > posterior_predictive
  > log_likelihood
  > sample_stats
  > prior
  > prior_predictive
  > observed_data
  > constant_data

julia> idata.posterior
Dataset with dimensions: 
  Dim{:draw} Sampled{Int64} Int64[0, 1, …, 498, 499] ForwardOrdered Irregular Points,
  Dim{:chain} Sampled{Int64} Int64[0, 1, 2, 3] ForwardOrdered Irregular Points,
  Dim{:school} Categorical{String} String[Choate, Deerfield, …, St. Paul's, Mt. Hermon] Unordered
and 3 layers:
  :mu    Float64 dims: Dim{:draw}, Dim{:chain} (500×4)
  :theta Float64 dims: Dim{:school}, Dim{:draw}, Dim{:chain} (8×500×4)
  :tau   Float64 dims: Dim{:draw}, Dim{:chain} (500×4)

with metadata Dict{String, Any} with 6 entries:
  "created_at"                => "2022-10-13T14:37:37.315398"
  "inference_library_version" => "4.2.2"
  "sampling_time"             => 7.48011
  "tuning_steps"              => 1000
  "arviz_version"             => "0.13.0.dev0"
  "inference_library"         => "pymc"

julia> dropdims(mean(idata.posterior.theta; dims=(:draw, :chain)); dims=(:draw, :chain))  # reduce dims
8-element DimArray{Float64,1} theta with dimensions: 
  Dim{:school} Categorical{String} String[Choate, Deerfield, …, St. Paul's, Mt. Hermon] Unordered
and reference dimensions: 
  Dim{:draw} Sampled{Float64} Float64[249.5] ForwardOrdered Irregular Points,
  Dim{:chain} Sampled{Float64} Float64[1.5] ForwardOrdered Irregular Points
 "Choate"            6.46006
 "Deerfield"         5.02755
 "Phillips Andover"  3.93803
 "Phillips Exeter"   4.87161
 "Hotchkiss"         3.66684
 "Lawrenceville"     3.97469
 "St. Paul's"        6.58092
 "Mt. Hermon"        4.77241

julia> mapslices(normalize, idata.posterior.theta; dims=(:draw, :chain))  # keep dims
8×500×4 DimArray{Float64,3} theta with dimensions: 
  Dim{:school} Categorical{String} String[Choate, Deerfield, …, St. Paul's, Mt. Hermon] Unordered,
  Dim{:draw} Sampled{Int64} Int64[0, 1, …, 498, 499] ForwardOrdered Irregular Points,
  Dim{:chain} Sampled{Int64} Int64[0, 1, 2, 3] ForwardOrdered Irregular Points
[:, :, 1]
                      0          1           2          3          4          …  496           497            498           499
  "Choate"            0.0315722  0.0289199   0.0146283  0.0257209  0.0234451       0.0243186    -0.000547943    0.0266568     0.0170699
  "Deerfield"         0.0316057  0.0291295   0.0183722  0.0281077  0.0183947       0.0260931     0.00432398     0.0220399     0.0236556
  "Phillips Andover"  0.0483347  0.0101484   0.035381   0.0320071  0.022679        0.00721072    0.0225732     -0.0160478    -0.0301376
  "Phillips Exeter"   0.0352315  0.0301817   0.0188626  0.0184575  0.0501963       0.00659421    0.011892       0.0100332     0.00861284
  "Hotchkiss"         0.0202402  0.0283366   0.03625    0.032876   0.0112359  …    0.00439817    0.0193147     -0.00806875   -0.00182223
  "Lawrenceville"     0.0578453  0.00819018  0.02787    0.0238147  0.0411574       0.00631814    0.0238403     -0.00970274   -0.014562
  "St. Paul's"        0.0354355  0.0269973   0.020418   0.0276498  0.0303843       0.0116846     0.0132371      0.0144745     0.0203155
  "Mt. Hermon"        0.0451373  0.018511    0.0262757  0.0094561  0.0510854       0.0140476     0.00916629     0.0191239     0.0299318
[and 3 more slices...]

I’m interested to see what you come up with.

bertschi · August 17, 2023, 3:57pm

Fair enough, also had this feeling when using MonteCarloMeasurements at the beginning. At least for my use cases, its “container of particles” approach worked reasonably well and seems to interact rather nicely with broadcasting etc.
In any case, curious what you come up with as your other MCMC libraries are great.

Topic		Replies	Views
[ANN] MonteCarloMeasurements.jl Package Announcements announcement , monte-carlo , measurements	34	4757	January 22, 2020
Random variables in Julia (working list) Statistics distributions	36	6106	November 27, 2022
Working with posteriors Statistics question , splitapplycombine	7	834	January 22, 2019
Function rand() or sample() for MCMC sampling and similar? Probabilistic Programming	19	1629	September 23, 2019
Rand/logpdf semantic consistency Probabilistic Programming	12	1109	December 3, 2020

Is there a package for "random vectors" like rv in R?

Related topics