Dear all,
Iโd like to announce the packages ArraysOfArrays (somewhat belatedly, it was registered in November) and ShapesOfVariables (brand new). Both packages are about presenting a duality of view for Arrays, and they complement each other.
ArraysOfArrays
ArraysOfArrays makes it possible to view data both as a flat array and as a nested array of arrays. The package supports:
- N-dimensional arrays of M-dimensional same-sized arrays (
ArrayOfSimilarArrays
), backed by an N+M-dimensional flat array. - Vectors of arrays of different size but same dimensionality (
VectorOfArrays
), backed by a flat vector.
You can switch between the flat and nested view, resp. use both views at the same time, without copying any data:
using ArraysOfArrays, Random
A = rand(2,3,4,5,6)
A_nested = nestedview(A, Val(2))
typeof(A_nested) <: AbstractArray{<:AbstractArray{Float64,2},3}
flatview(A_nested) === A
B_nested = VectorOfArrays{Float64,2}()
typeof(B_nested) <: AbstractVector{<:AbstractArray{Float64,2}}
push!(B_nested, rand(2,2))
push!(B_nested, rand(2,3))
push!(B_nested, rand(3,2))
push!(B_nested, rand(1,2))
B = flatview(B_nested)
typeof(B) == Vector{Float64}
size(B) == (18,)
ArraysOfArrays also provides a function consgroupedview
that creates a VectorOfVectors
representing a zero-copy (and very fast) group-by view on consecutive group-labels:
using ArraysOfArrays, TypedTables, Random
table = Table(a = [1, 1, 2, 2, 1, 1, 1, 3], b = nestedview(rand(4, 8)))
grouped_tables = consgroupedview(table.a, table)
grouped_tables[1] isa Table
flatview(grouped_tables) === table
Nesting of arrays at multiple levels is also possible:
using ArraysOfArrays, Random
A = rand(2,3,4,5,6)
A_deepnested = nestedview(nestedview(A, Val(2)), Val(1))
typeof(A_deepnested) <: AbstractArray{
<:AbstractArray{
<:AbstractArray{Float64, 2},
1
},
2
}
ArraysOfArrays continues a story that began with ElasticArrays and UnsafeArrays: Using an ElasticArray
(which is resizeable in itโs last dimension) allows you to push arrays to a VectorOfSimilarArrays
:
using ArraysOfArrays, ElasticArrays
C_nested = nestedview(ElasticArray{Float64}(undef, 2,3,0), Val(2))
size(flatview(C_nested)) == (2, 3, 0)
typeof(C_nested) <: AbstractVector{<:AbstractMatrix}
for i in 1:4 push!(C_nested, rand(2, 3)) end
size(flatview(C_nested)) == (2, 3, 4)
UnsafeArrays.@uviews
enables access to elements of ArraysOfArrays
nested arrays as allocation-free pointer-based views:
using UnsafeArrays
@uviews C_nested begin
isbits(C_nested[1])
end
Note: As the name implies, UnsafeArrays should be used with care and only when allocation costs of normal views becomes a performance-limiting factor (typically only in tight loops in multi-threaded applications).
ShapesOfVariables
ShapesOfVariables allows viewing a flat vector (of numerical real values) as a named tuple representing variables/parameters of mixed kind and size (real or complex scalars, arrays and constants), creating a zero-copy dual view of the underlying data. The intention is to provide a bridge between user code operating on named structured variables and algorithms operating on flat vectors of anonymous real values (often the case for, e.g. fitting/optimization algorithms).
ShapesOfVariables has some overlap in functionality with @Tamas_Pappโs TransformVariables, but provides a duality of view instead of transformations (and therefore uses data views instead of data copies).
Given a definition of a set of variables
using ShapesOfVariables, Random
varshapes = VarShapes(
a = ScalarShape{Real}(),
b = ArrayShape{Real}(2, 3),
c = ConstValueShape([1 2; 3 4])
)
you can determine the total number of degrees of freedom
totalndof(varshapes) == 7
and allocate a random flat data vector of appropriate size:
data = Vector{Float64}(undef, varshapes)
size(data) == (7,)
rand!(data)
The function tupleview
creates a zero-copy view of the flat data vector as a NamedTuple
:
tupleview = varshapes(data)
println(tupleview)
will result in
(a = 0.8715053286676477, b = [0.750959 0.954615 0.331813; 0.779548 0.846969 0.111558], c = [1 2; 3 4])
Since ShapesOfVariables uses views, changes in the data vector immediately visible in the named tuple. Note that c
, as a constant with zero degrees of freedom, is not stored in the data vector at all (implicitly pins a variable/parameter during a fit/optimization).
Note: @Tamas_Pappโs EponymTuples makes it very easy to define functions that take such tuples as parameters and deconstruct them, so that the variable names can be used directly inside the function body.
ShapesOfVariables also supports handling of multiple parameter sets, i.e. vectors of parameter vectors. Here, ArraysOfArrays.VectorOfSimilarVectors
can be used as an efficient data structure that is backed by a single matrix. The named-variable view is a TypedTables.Table:
using ShapesOfVariables, ArraysOfArrays, TypedTables, Tables, Random
varshapes = VarShapes(
a = ScalarShape{Real}(),
b = ArrayShape{Real}(2, 3),
c = ConstValueShape([1 2; 3 4])
)
multidata = VectorOfSimilarVectors{Int}(varshapes)
resize!(multidata, 10)
typeof(flatview(multidata)) <: AbstractMatrix
rand!(flatview(multidata), 0:99)
table = varshapes(multidata)
keys(Tables.columns(table)) == (:a, :b, :c)
println(table)
results in a
Table with 3 columns and 10 rows:
a b c
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 43 [6 27 58; 60 44 88] [1 2; 3 4]
2 โ 77 [48 81 21; 63 69 82] [1 2; 3 4]
3 โ 88 [58 3 46; 71 84 43] [1 2; 3 4]
4 โ 21 [50 95 26; 79 12 83] [1 2; 3 4]
5 โ 16 [48 18 82; 92 54 89] [1 2; 3 4]
6 โ 77 [72 48 10; 62 83 83] [1 2; 3 4]
7 โ 18 [45 56 64; 54 87 73] [1 2; 3 4]
8 โ 27 [6 43 41; 34 20 8] [1 2; 3 4]
9 โ 48 [95 72 25; 5 12 42] [1 2; 3 4]
10 โ 35 [76 65 63; 86 43 7] [1 2; 3 4]
This table is a structured view into the underlying flat data matrix:
@show flatview(multidata)
shows
7ร10 ElasticArrays.ElasticArray{Int64,2,1}:
43 77 88 21 16 77 18 27 48 35
6 48 58 50 48 72 45 6 95 76
60 63 71 79 92 62 54 34 5 86
27 81 3 95 18 48 56 43 72 65
44 69 84 12 54 83 87 20 12 43
58 21 46 26 82 10 64 41 25 63
88 82 43 83 89 83 73 8 42 7
ShapesOfVariables somewhat completes (for now) the series I started with ElasticArrays, UnsafeArrays and ArraysOfArrays. Iโm sure that there is lotโs of room for improvements and additional features, though (and, alas, likely bug fixes as well) - please let me know!
Next to the two array-related dualities flat/nested and flat/structured, there is to me one more duality that is important: arrays-of-structs vs. structs-of-arrays. Luckily, weโve got that one covered by the Tables interface, particularly by the Tables implementations StructArrays by @piever (for native struct
s) and TypedTables by @andyferris (for NamedTuple
s).
Cheers,
Oliver