[ANN] ArraysOfArrays.jl and ShapesOfVariables.jl

oschulz · March 9, 2019, 9:35pm

Dear all,

I’d like to announce the packages ArraysOfArrays (somewhat belatedly, it was registered in November) and ShapesOfVariables (brand new). Both packages are about presenting a duality of view for Arrays, and they complement each other.

ArraysOfArrays

ArraysOfArrays makes it possible to view data both as a flat array and as a nested array of arrays. The package supports:

N-dimensional arrays of M-dimensional same-sized arrays (ArrayOfSimilarArrays), backed by an N+M-dimensional flat array.
Vectors of arrays of different size but same dimensionality (VectorOfArrays), backed by a flat vector.

You can switch between the flat and nested view, resp. use both views at the same time, without copying any data:

using ArraysOfArrays, Random

A = rand(2,3,4,5,6)
A_nested = nestedview(A, Val(2))
typeof(A_nested) <: AbstractArray{<:AbstractArray{Float64,2},3}
flatview(A_nested) === A

B_nested = VectorOfArrays{Float64,2}()
typeof(B_nested) <: AbstractVector{<:AbstractArray{Float64,2}}
push!(B_nested, rand(2,2))
push!(B_nested, rand(2,3))
push!(B_nested, rand(3,2))
push!(B_nested, rand(1,2))
B = flatview(B_nested)
typeof(B) == Vector{Float64}
size(B) == (18,)

ArraysOfArrays also provides a function consgroupedview that creates a VectorOfVectors representing a zero-copy (and very fast) group-by view on consecutive group-labels:

using ArraysOfArrays, TypedTables, Random

table = Table(a = [1, 1, 2, 2, 1, 1, 1, 3], b = nestedview(rand(4, 8)))
grouped_tables = consgroupedview(table.a, table)

grouped_tables[1] isa Table
flatview(grouped_tables) === table

Nesting of arrays at multiple levels is also possible:

using ArraysOfArrays, Random

A = rand(2,3,4,5,6)
A_deepnested = nestedview(nestedview(A, Val(2)), Val(1))

typeof(A_deepnested) <: AbstractArray{
    <:AbstractArray{
        <:AbstractArray{Float64, 2},
        1
    },
    2
}

ArraysOfArrays continues a story that began with ElasticArrays and UnsafeArrays: Using an ElasticArray (which is resizeable in it’s last dimension) allows you to push arrays to a VectorOfSimilarArrays:

using ArraysOfArrays, ElasticArrays

C_nested = nestedview(ElasticArray{Float64}(undef, 2,3,0), Val(2))
size(flatview(C_nested)) == (2, 3, 0)
typeof(C_nested) <: AbstractVector{<:AbstractMatrix}
for i in 1:4 push!(C_nested, rand(2, 3)) end
size(flatview(C_nested)) == (2, 3, 4)

UnsafeArrays.@uviews enables access to elements of ArraysOfArrays nested arrays as allocation-free pointer-based views:

using UnsafeArrays

@uviews C_nested begin
    isbits(C_nested[1])
end

Note: As the name implies, UnsafeArrays should be used with care and only when allocation costs of normal views becomes a performance-limiting factor (typically only in tight loops in multi-threaded applications).

ShapesOfVariables

ShapesOfVariables allows viewing a flat vector (of numerical real values) as a named tuple representing variables/parameters of mixed kind and size (real or complex scalars, arrays and constants), creating a zero-copy dual view of the underlying data. The intention is to provide a bridge between user code operating on named structured variables and algorithms operating on flat vectors of anonymous real values (often the case for, e.g. fitting/optimization algorithms).

ShapesOfVariables has some overlap in functionality with @Tamas_Papp’s TransformVariables, but provides a duality of view instead of transformations (and therefore uses data views instead of data copies).

Given a definition of a set of variables

using ShapesOfVariables, Random

varshapes = VarShapes(
    a = ScalarShape{Real}(),
    b = ArrayShape{Real}(2, 3),
    c = ConstValueShape([1 2; 3 4])
)

you can determine the total number of degrees of freedom

totalndof(varshapes) == 7

and allocate a random flat data vector of appropriate size:

data = Vector{Float64}(undef, varshapes)
size(data) == (7,)

rand!(data)

The function tupleview creates a zero-copy view of the flat data vector as a NamedTuple:

tupleview = varshapes(data)
println(tupleview)

will result in

(a = 0.8715053286676477, b = [0.750959 0.954615 0.331813; 0.779548 0.846969 0.111558], c = [1 2; 3 4])

Since ShapesOfVariables uses views, changes in the data vector immediately visible in the named tuple. Note that c, as a constant with zero degrees of freedom, is not stored in the data vector at all (implicitly pins a variable/parameter during a fit/optimization).

Note: @Tamas_Papp’s EponymTuples makes it very easy to define functions that take such tuples as parameters and deconstruct them, so that the variable names can be used directly inside the function body.

ShapesOfVariables also supports handling of multiple parameter sets, i.e. vectors of parameter vectors. Here, ArraysOfArrays.VectorOfSimilarVectors can be used as an efficient data structure that is backed by a single matrix. The named-variable view is a TypedTables.Table:

using ShapesOfVariables, ArraysOfArrays, TypedTables, Tables, Random

varshapes = VarShapes(
    a = ScalarShape{Real}(),
    b = ArrayShape{Real}(2, 3),
    c = ConstValueShape([1 2; 3 4])
)

multidata = VectorOfSimilarVectors{Int}(varshapes)
resize!(multidata, 10)

typeof(flatview(multidata)) <: AbstractMatrix
rand!(flatview(multidata), 0:99)

table = varshapes(multidata)
keys(Tables.columns(table)) == (:a, :b, :c)
println(table)

results in a

Table with 3 columns and 10 rows:
      a   b                     c
    ┌─────────────────────────────────────
 1  │ 43  [6 27 58; 60 44 88]   [1 2; 3 4]
 2  │ 77  [48 81 21; 63 69 82]  [1 2; 3 4]
 3  │ 88  [58 3 46; 71 84 43]   [1 2; 3 4]
 4  │ 21  [50 95 26; 79 12 83]  [1 2; 3 4]
 5  │ 16  [48 18 82; 92 54 89]  [1 2; 3 4]
 6  │ 77  [72 48 10; 62 83 83]  [1 2; 3 4]
 7  │ 18  [45 56 64; 54 87 73]  [1 2; 3 4]
 8  │ 27  [6 43 41; 34 20 8]    [1 2; 3 4]
 9  │ 48  [95 72 25; 5 12 42]   [1 2; 3 4]
 10 │ 35  [76 65 63; 86 43 7]   [1 2; 3 4]

This table is a structured view into the underlying flat data matrix:

@show flatview(multidata)

shows

7×10 ElasticArrays.ElasticArray{Int64,2,1}:
 43  77  88  21  16  77  18  27  48  35
  6  48  58  50  48  72  45   6  95  76
 60  63  71  79  92  62  54  34   5  86
 27  81   3  95  18  48  56  43  72  65
 44  69  84  12  54  83  87  20  12  43
 58  21  46  26  82  10  64  41  25  63
 88  82  43  83  89  83  73   8  42   7

ShapesOfVariables somewhat completes (for now) the series I started with ElasticArrays, UnsafeArrays and ArraysOfArrays. I’m sure that there is lot’s of room for improvements and additional features, though (and, alas, likely bug fixes as well) - please let me know!

Next to the two array-related dualities flat/nested and flat/structured, there is to me one more duality that is important: arrays-of-structs vs. structs-of-arrays. Luckily, we’ve got that one covered by the Tables interface, particularly by the Tables implementations StructArrays by @piever (for native structs) and TypedTables by @andyferris (for NamedTuples).

Cheers,

Oliver

Tamas_Papp · March 10, 2019, 6:51am

Both packages look very interesting and may answer problems I have been struggling with for a while. Specifically, I think that TransformVariables.jl does two things: slice up an array, then transform parts of it. I have been thinking about decoupling the two for a while, especially for coding log posteriors that just calculate a gradient manually. ShapesOfVariables may provide an answer, I just need to come up with a nice API for Jacobian adjustments.

ArraysOfArrays could help with working with log posteriors, especially if it combines well with ShapesOfVariables.

Thanks for the detailed announcement, I will look at both packages, and will contribute and rework my own to use them if that’s a good match.

oschulz · March 10, 2019, 11:23am

Thanks @Tamas_Papp!

I think we have similar motivations here: This was partly built to support parameter handling for BAT.jl (currently being registered). Into which, in turn, I’d like to integrate your DynamicHMC and I think I’ll need functionality from your TransformVariables as well. If slicing/restructuring and transformations were decoupled, that would be really nice!

oschulz · March 10, 2019, 11:48am

@ChrisRackauckas, ShapesOfVariables also contains the “descibe the shape of an array so that another piece of code can instantiate it” functionality we talked about at JuliaCon:

using ShapesOfVariables
shape = ArrayShape{Real}(2, 3)

specifies a 2x3 real-valued array, and

A = Array(undef, shape)

instantiates it. If the element-type of the array shape is abstract (Real, in this example) or a union, ArraysOfArrays will try to find a more-specific type (e.g. Float64, for Real):

typeof(A) == Array{Float64,2}

This default can also be overridden during array construction:

B = Array{Float32}(undef, shape)
typeof(B) == Array{Float32,2}

oschulz · March 10, 2019, 11:52am

@Tamas_Papp, if you find these packages suitable in principle to be used in TransformVariables , but need additional functionality, please let me know - I’d be happy to collaborate.

Tamas_Papp · March 10, 2019, 1:43pm

Thanks. First I will factor out the relevant part of TransformVariables (within that package), and then see what can be combined.

chakravala · March 10, 2019, 3:06pm

Yes, i am possibly interested in using this for Grassmann.jl, where I have defined my own vector type which is also an array of arrays.

However, I am using StaticArrays instead of standard Arrays, is it compatible?

oschulz · March 10, 2019, 4:47pm

@chakravala, yes, there is some explicit support for StaticArrays in ArraysOfArrays.jl (currently uses reshape and reinterpret, not a custom type):

julia> using ArraysOfArrays, StaticArrays
julia> A = rand(3, 7)
julia> A_nested = nestedview(A, StaticVector{3})
7-element reshape(reinterpret(SArray{Tuple{3},Float64,1,3}, ::Array{Float64,2}), 7) with eltype SArray{Tuple{3},Float64,1,3}:
 [0.323916, 0.345113, 0.0632536]
 [0.762203, 0.426843, 0.67546]  
 [0.531716, 0.897119, 0.950428] 
 [0.970133, 0.115665, 0.0952931]
 [0.271295, 0.359833, 0.756045] 
 [0.646595, 0.477087, 0.318215] 
 [0.704045, 0.440413, 0.310705]

Would this work for you?

oschulz · March 10, 2019, 5:01pm

Btw. ArraysOfArrays.jl also provides explicit support for Statistics and StatsBase for vectors of vectors:

julia> using ArraysOfArrays, Random, Statistics, StatsBase
julia> paramvalues = nestedview(randn(4,100))

100-element ArrayOfSimilarArrays{Float64,1,1,2,Array{Float64,2}}:
 [-0.434986, 0.366588, 0.109901, -0.525764]  
 [2.09369, 0.0882477, 1.44772, 1.71197]      
 [0.472904, -0.685981, -0.548965, -0.358208] 
 ⋮                                           
 [0.175075, 0.414118, 0.404391, 0.413817]    
 [1.10377, -0.616455, -0.632265, 3.44201]    
 [-0.0103048, -0.81376, -0.331084, 0.628358] 

julia> weights = FrequencyWeights(rand(100))
julia> cov(paramvalues, weights)

4×4 Array{Float64,2}:
  1.01295    -0.248277  -0.0797657   0.0835061
 -0.248277    1.02367    0.111126   -0.268146 
 -0.0797657   0.111126   1.61915     0.0641023
  0.0835061  -0.268146   0.0641023   0.923732

ChrisRackauckas · March 10, 2019, 5:20pm

What do you use for the shape of a sparse matrix?

oschulz · March 10, 2019, 8:31pm

@ChrisRackauckas: What do you use for the shape of a sparse matrix?

Uhm, nothing, so far … I only implemented shapes for standard dense arrays at this point (also no StaticArray support in ShapesOfVariables yet, but I definitely want to add that).

I guess we’ll need specific support for special matrix types, like triangular, banded, etc. at some point. In principle, types describing additional matrix shapes can be added easily (directly in ShapesOfVariables.jl, or packages that define special matrix types could depend on ShapesOfVariables and add shape types for them).

For generic sparse matrices - what would be the number of degrees of freedom? The full matrix size? Or a fixed/maximum number of values that can be non-zero?

Topic		Replies	Views
[RFC/ANN] ComponentArrays.jl for building composable models without a modeling language Package Announcements diffeq , optim , componentarrays	22	2496	April 25, 2020
Help with simplified StaticArrays implementation General Usage array	16	2604	September 12, 2020
Quick Longitudinal Access to arrays General Usage	34	2261	March 3, 2018
Recent developments in StaticArrays.jl land Package Announcements performance , array , linearalgebra , arrays	12	3718	March 11, 2021
Resize!(matrix) General Usage question	44	11027	April 22, 2018

[ANN] ArraysOfArrays.jl and ShapesOfVariables.jl

ArraysOfArrays

ShapesOfVariables

Related topics