How to convert wrapped matrix type of KeyedArray

jtackm · January 18, 2023, 12:07pm

Say I have a KeyedArray wrapping a dense matrix:

using AxisKeys, Random
n = 100
A = KeyedArray(rand(0:1, n, n), dim1=["a$i" for i in 1:n], dim2=["b$i" for i in 1:n])

2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   dim1 ∈ 100-element Vector{String}
→   dim2 ∈ 100-element Vector{String}
And data, 100×100 Matrix{Int64}:
            ("b1")  ("b2")  ("b3")  …  ("b97")  ("b98")  ("b99")  ("b100")
  ("a1")     0       0       0          0        1        1        1
  ("a2")     1       0       0          1        0        1        0
  ("a3")     1       1       0          1        0        0        0
  ("a4")     0       1       0          1        1        0        0
  ("a5")     1       1       1      …   0        0        0        0
  ("a6")     0       1       0          0        1        1        1
   ⋮                                ⋱                     ⋮       
  ("a94")    1       1       0          1        0        1        1
  ("a95")    0       0       1          0        1        1        1
  ("a96")    1       0       1      …   0        1        0        0
  ("a97")    0       0       1          0        0        1        0
  ("a98")    1       1       1          1        0        0        0
  ("a99")    0       0       0          0        1        0        0
  ("a100")   0       1       1          0        1        1        1

What is the easiest way to convert the underlying matrix type (say from dense to sparse) without changing axis information? My best solution so far is quite unwieldy:

using SparseArrays
A_sparse = KeyedArray(sparse(A); (dimnames(A) .=> axiskeys(A))...)

2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   dim1 ∈ 100-element Vector{String}
→   dim2 ∈ 100-element Vector{String}
And data, 100×100 SparseMatrixCSC{Int64, Int64} with 5028 stored entries:
            ("b1")  ("b2")  ("b3")  …  ("b97")  ("b98")  ("b99")  ("b100")
  ("a1")     0       0       0          0        1        1        1
  ("a2")     1       0       0          1        0        1        0
  ("a3")     1       1       0          1        0        0        0
  ("a4")     0       1       0          1        1        0        0
  ("a5")     1       1       1      …   0        0        0        0
  ("a6")     0       1       0          0        1        1        1
   ⋮                                ⋱                     ⋮       
  ("a94")    1       1       0          1        0        1        1
  ("a95")    0       0       1          0        1        1        1
  ("a96")    1       0       1      …   0        1        0        0
  ("a97")    0       0       1          0        0        1        0
  ("a98")    1       1       1          1        0        0        0
  ("a99")    0       0       0          0        1        0        0
  ("a100")   0       1       1          0        1        1        1

Of course I could define a function, but I’m thinking there must be a build-in option I’m missing.

Dan · January 18, 2023, 12:52pm

Does:

refill(na::NamedDimsArray{X},m) where X = 
  NamedDimsArray{X,eltype(m),ndims(m),typeof(m)}(m)

A_sparse2 = KeyedArray(refill(parent(A), sparse(A)), axiskeys(A))

seem less unwieldy?

Another function can be added:

refill(ka::KeyedArray,m) = 
  ( na = refill(parent(ka),m) ; KeyedArray(na, axiskeys(ka)) )

A_sparse3 = refill(A, sparse(A))

and verifying:

julia> A_sparse == A_sparse2 == A_sparse3
true

aplavin · January 18, 2023, 4:01pm

With Accessors, you can conveniently apply sparse to AxisKeys.keyless_unname(A), and store the result back:

julia> using AccessorsExtra

julia> A_sparse = @modify(sparse, AxisKeys.keyless_unname(A))
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   dim1 ∈ 100-element Vector{String}
→   dim2 ∈ 100-element Vector{String}
And data, 100×100 SparseMatrixCSC{Int64, Int64} with 5001 stored entries:
            ("b1")  ("b2")  ("b3")  …  ("b98")  ("b99")  ("b100")
  ("a1")     0       0       1          0        0        0
...

Btw, even without extra packages, your KeyedArray(sparse(A); (dimnames(A) .=> axiskeys(A))...) can be simplified to KeyedArray(sparse(A); named_axiskeys(A)...).

jtackm · January 19, 2023, 3:09pm

Thanks for the suggestions! I didn’t know about named_axiskeys, that is pretty much what I had in mind

Topic		Replies	Views
Fancy Arrays: between 1 axis with compound keys, and multiple axes General Usage	5	331	May 11, 2022
StructArray into KeyedArray General Usage axiskeys , structarrays	6	116	December 1, 2024
From DataFrame to multidimensional Array Data array , dataframes	9	2704	June 21, 2021
Tullio & AxisKeys can dummy variables refer to named keys General Usage question , tullio , axiskeys , dimensionaldata	0	118	February 15, 2024
[ANN] AxisKeys.jl Package Announcements	2	704	April 5, 2020

How to convert wrapped matrix type of KeyedArray

Related topics