[ANN] ArraysOfArrays.jl and ShapesOfVariables.jl

Dear all,

Iโ€™d like to announce the packages ArraysOfArrays (somewhat belatedly, it was registered in November) and ShapesOfVariables (brand new). Both packages are about presenting a duality of view for Arrays, and they complement each other.

ArraysOfArrays

ArraysOfArrays makes it possible to view data both as a flat array and as a nested array of arrays. The package supports:

  • N-dimensional arrays of M-dimensional same-sized arrays (ArrayOfSimilarArrays), backed by an N+M-dimensional flat array.
  • Vectors of arrays of different size but same dimensionality (VectorOfArrays), backed by a flat vector.

You can switch between the flat and nested view, resp. use both views at the same time, without copying any data:

using ArraysOfArrays, Random

A = rand(2,3,4,5,6)
A_nested = nestedview(A, Val(2))
typeof(A_nested) <: AbstractArray{<:AbstractArray{Float64,2},3}
flatview(A_nested) === A

B_nested = VectorOfArrays{Float64,2}()
typeof(B_nested) <: AbstractVector{<:AbstractArray{Float64,2}}
push!(B_nested, rand(2,2))
push!(B_nested, rand(2,3))
push!(B_nested, rand(3,2))
push!(B_nested, rand(1,2))
B = flatview(B_nested)
typeof(B) == Vector{Float64}
size(B) == (18,)

ArraysOfArrays also provides a function consgroupedview that creates a VectorOfVectors representing a zero-copy (and very fast) group-by view on consecutive group-labels:

using ArraysOfArrays, TypedTables, Random

table = Table(a = [1, 1, 2, 2, 1, 1, 1, 3], b = nestedview(rand(4, 8)))
grouped_tables = consgroupedview(table.a, table)

grouped_tables[1] isa Table
flatview(grouped_tables) === table

Nesting of arrays at multiple levels is also possible:

using ArraysOfArrays, Random

A = rand(2,3,4,5,6)
A_deepnested = nestedview(nestedview(A, Val(2)), Val(1))

typeof(A_deepnested) <: AbstractArray{
    <:AbstractArray{
        <:AbstractArray{Float64, 2},
        1
    },
    2
}

ArraysOfArrays continues a story that began with ElasticArrays and UnsafeArrays: Using an ElasticArray (which is resizeable in itโ€™s last dimension) allows you to push arrays to a VectorOfSimilarArrays:

using ArraysOfArrays, ElasticArrays

C_nested = nestedview(ElasticArray{Float64}(undef, 2,3,0), Val(2))
size(flatview(C_nested)) == (2, 3, 0)
typeof(C_nested) <: AbstractVector{<:AbstractMatrix}
for i in 1:4 push!(C_nested, rand(2, 3)) end
size(flatview(C_nested)) == (2, 3, 4)

UnsafeArrays.@uviews enables access to elements of ArraysOfArrays nested arrays as allocation-free pointer-based views:

using UnsafeArrays

@uviews C_nested begin
    isbits(C_nested[1])
end

Note: As the name implies, UnsafeArrays should be used with care and only when allocation costs of normal views becomes a performance-limiting factor (typically only in tight loops in multi-threaded applications).

ShapesOfVariables

ShapesOfVariables allows viewing a flat vector (of numerical real values) as a named tuple representing variables/parameters of mixed kind and size (real or complex scalars, arrays and constants), creating a zero-copy dual view of the underlying data. The intention is to provide a bridge between user code operating on named structured variables and algorithms operating on flat vectors of anonymous real values (often the case for, e.g. fitting/optimization algorithms).

ShapesOfVariables has some overlap in functionality with @Tamas_Pappโ€™s TransformVariables, but provides a duality of view instead of transformations (and therefore uses data views instead of data copies).

Given a definition of a set of variables

using ShapesOfVariables, Random

varshapes = VarShapes(
    a = ScalarShape{Real}(),
    b = ArrayShape{Real}(2, 3),
    c = ConstValueShape([1 2; 3 4])
)

you can determine the total number of degrees of freedom

totalndof(varshapes) == 7

and allocate a random flat data vector of appropriate size:

data = Vector{Float64}(undef, varshapes)
size(data) == (7,)

rand!(data)

The function tupleview creates a zero-copy view of the flat data vector as a NamedTuple:

tupleview = varshapes(data)
println(tupleview)

will result in

(a = 0.8715053286676477, b = [0.750959 0.954615 0.331813; 0.779548 0.846969 0.111558], c = [1 2; 3 4])

Since ShapesOfVariables uses views, changes in the data vector immediately visible in the named tuple. Note that c, as a constant with zero degrees of freedom, is not stored in the data vector at all (implicitly pins a variable/parameter during a fit/optimization).

Note: @Tamas_Pappโ€™s EponymTuples makes it very easy to define functions that take such tuples as parameters and deconstruct them, so that the variable names can be used directly inside the function body.

ShapesOfVariables also supports handling of multiple parameter sets, i.e. vectors of parameter vectors. Here, ArraysOfArrays.VectorOfSimilarVectors can be used as an efficient data structure that is backed by a single matrix. The named-variable view is a TypedTables.Table:

using ShapesOfVariables, ArraysOfArrays, TypedTables, Tables, Random

varshapes = VarShapes(
    a = ScalarShape{Real}(),
    b = ArrayShape{Real}(2, 3),
    c = ConstValueShape([1 2; 3 4])
)

multidata = VectorOfSimilarVectors{Int}(varshapes)
resize!(multidata, 10)

typeof(flatview(multidata)) <: AbstractMatrix
rand!(flatview(multidata), 0:99)

table = varshapes(multidata)
keys(Tables.columns(table)) == (:a, :b, :c)
println(table)

results in a

Table with 3 columns and 10 rows:
      a   b                     c
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 1  โ”‚ 43  [6 27 58; 60 44 88]   [1 2; 3 4]
 2  โ”‚ 77  [48 81 21; 63 69 82]  [1 2; 3 4]
 3  โ”‚ 88  [58 3 46; 71 84 43]   [1 2; 3 4]
 4  โ”‚ 21  [50 95 26; 79 12 83]  [1 2; 3 4]
 5  โ”‚ 16  [48 18 82; 92 54 89]  [1 2; 3 4]
 6  โ”‚ 77  [72 48 10; 62 83 83]  [1 2; 3 4]
 7  โ”‚ 18  [45 56 64; 54 87 73]  [1 2; 3 4]
 8  โ”‚ 27  [6 43 41; 34 20 8]    [1 2; 3 4]
 9  โ”‚ 48  [95 72 25; 5 12 42]   [1 2; 3 4]
 10 โ”‚ 35  [76 65 63; 86 43 7]   [1 2; 3 4]

This table is a structured view into the underlying flat data matrix:

@show flatview(multidata)

shows

7ร—10 ElasticArrays.ElasticArray{Int64,2,1}:
 43  77  88  21  16  77  18  27  48  35
  6  48  58  50  48  72  45   6  95  76
 60  63  71  79  92  62  54  34   5  86
 27  81   3  95  18  48  56  43  72  65
 44  69  84  12  54  83  87  20  12  43
 58  21  46  26  82  10  64  41  25  63
 88  82  43  83  89  83  73   8  42   7

ShapesOfVariables somewhat completes (for now) the series I started with ElasticArrays, UnsafeArrays and ArraysOfArrays. Iโ€™m sure that there is lotโ€™s of room for improvements and additional features, though (and, alas, likely bug fixes as well) - please let me know!

Next to the two array-related dualities flat/nested and flat/structured, there is to me one more duality that is important: arrays-of-structs vs. structs-of-arrays. Luckily, weโ€™ve got that one covered by the Tables interface, particularly by the Tables implementations StructArrays by @piever (for native structs) and TypedTables by @andyferris (for NamedTuples).

Cheers,

Oliver

16 Likes

Both packages look very interesting and may answer problems I have been struggling with for a while. Specifically, I think that TransformVariables.jl does two things: slice up an array, then transform parts of it. I have been thinking about decoupling the two for a while, especially for coding log posteriors that just calculate a gradient manually. ShapesOfVariables may provide an answer, I just need to come up with a nice API for Jacobian adjustments.

ArraysOfArrays could help with working with log posteriors, especially if it combines well with ShapesOfVariables.

Thanks for the detailed announcement, I will look at both packages, and will contribute and rework my own to use them if thatโ€™s a good match.

2 Likes

Thanks @Tamas_Papp!

I think we have similar motivations here: This was partly built to support parameter handling for BAT.jl (currently being registered). Into which, in turn, Iโ€™d like to integrate your DynamicHMC and I think Iโ€™ll need functionality from your TransformVariables as well. If slicing/restructuring and transformations were decoupled, that would be really nice!

@ChrisRackauckas, ShapesOfVariables also contains the โ€œdescibe the shape of an array so that another piece of code can instantiate itโ€ functionality we talked about at JuliaCon:

using ShapesOfVariables
shape = ArrayShape{Real}(2, 3)

specifies a 2x3 real-valued array, and

A = Array(undef, shape)

instantiates it. If the element-type of the array shape is abstract (Real, in this example) or a union, ArraysOfArrays will try to find a more-specific type (e.g. Float64, for Real):

typeof(A) == Array{Float64,2}

This default can also be overridden during array construction:

B = Array{Float32}(undef, shape)
typeof(B) == Array{Float32,2}
1 Like

@Tamas_Papp, if you find these packages suitable in principle to be used in TransformVariables , but need additional functionality, please let me know - Iโ€™d be happy to collaborate.

Thanks. First I will factor out the relevant part of TransformVariables (within that package), and then see what can be combined.

Yes, i am possibly interested in using this for Grassmann.jl, where I have defined my own vector type which is also an array of arrays.

However, I am using StaticArrays instead of standard Arrays, is it compatible?

@chakravala, yes, there is some explicit support for StaticArrays in ArraysOfArrays.jl (currently uses reshape and reinterpret, not a custom type):

julia> using ArraysOfArrays, StaticArrays
julia> A = rand(3, 7)
julia> A_nested = nestedview(A, StaticVector{3})
7-element reshape(reinterpret(SArray{Tuple{3},Float64,1,3}, ::Array{Float64,2}), 7) with eltype SArray{Tuple{3},Float64,1,3}:
 [0.323916, 0.345113, 0.0632536]
 [0.762203, 0.426843, 0.67546]  
 [0.531716, 0.897119, 0.950428] 
 [0.970133, 0.115665, 0.0952931]
 [0.271295, 0.359833, 0.756045] 
 [0.646595, 0.477087, 0.318215] 
 [0.704045, 0.440413, 0.310705] 

Would this work for you?

Btw. ArraysOfArrays.jl also provides explicit support for Statistics and StatsBase for vectors of vectors:

julia> using ArraysOfArrays, Random, Statistics, StatsBase
julia> paramvalues = nestedview(randn(4,100))

100-element ArrayOfSimilarArrays{Float64,1,1,2,Array{Float64,2}}:
 [-0.434986, 0.366588, 0.109901, -0.525764]  
 [2.09369, 0.0882477, 1.44772, 1.71197]      
 [0.472904, -0.685981, -0.548965, -0.358208] 
 โ‹ฎ                                           
 [0.175075, 0.414118, 0.404391, 0.413817]    
 [1.10377, -0.616455, -0.632265, 3.44201]    
 [-0.0103048, -0.81376, -0.331084, 0.628358] 

julia> weights = FrequencyWeights(rand(100))
julia> cov(paramvalues, weights)

4ร—4 Array{Float64,2}:
  1.01295    -0.248277  -0.0797657   0.0835061
 -0.248277    1.02367    0.111126   -0.268146 
 -0.0797657   0.111126   1.61915     0.0641023
  0.0835061  -0.268146   0.0641023   0.923732 

What do you use for the shape of a sparse matrix?

@ChrisRackauckas: What do you use for the shape of a sparse matrix?

Uhm, nothing, so far โ€ฆ I only implemented shapes for standard dense arrays at this point (also no StaticArray support in ShapesOfVariables yet, but I definitely want to add that).

I guess weโ€™ll need specific support for special matrix types, like triangular, banded, etc. at some point. In principle, types describing additional matrix shapes can be added easily (directly in ShapesOfVariables.jl, or packages that define special matrix types could depend on ShapesOfVariables and add shape types for them).

For generic sparse matrices - what would be the number of degrees of freedom? The full matrix size? Or a fixed/maximum number of values that can be non-zero?