Hello everyone!

I wrote a package UnitfulTensors.jl for working efficiently with arrays of physical quantities.

It implements most of the LinearAlgebra functionality, but with physical units (including arrays with mixed units).

The inclusion of units results in a moderate runtime overhead, often negligible. There is also a zero-overhead mode: check the units on a small-scale problem > turn them off with `units_off()`

> proceed with a large-scale problem.

## Showcase

```
using UnitfulTensors, LinearAlgebra
# h-parameters
H = UnitfulTensor([3.7u"kΩ" 1.3e-4
120 8.7u"μS"])
Zₛ = 50u"Ω" # Source impedance
Yₗ = 20u"mS" # Load admittance
Hₛ = deepcopy(H)
Hₛ[1, 1] += Zₛ
Zout = inv(Hₛ)[2, 2] # Output impedance
Hₛₗ = H + Diagonal(UnitfulTensor([Zₛ, Yₗ]))
Gₜ = 4 * Zₛ * Yₗ * (Hₛₗ[2, 1] / det(Hₛₗ))^2 # Transducer power gain
Gₜ₂ = 4 * Zₛ * Yₗ * (inv(Hₛₗ)[2, 1])^2 # Equivalent expression
println((Zout, value(Zout), Zout/1u"kΩ"))
println((Gₜ, Gₜ₂))
# output
(220264.31718061675 kg m^2 A^-2 s^-3, 220264.31718061675, 220.26431718061676)
(10.2353526224919, 10.2353526224919)
```

## Performance

### Dense 3×3 matrix

```
using UnitfulTensors, BenchmarkTools
N = 3
A = randn(N, N)
b = randn(N)
uA = UnitfulTensor(A, 𝐋 * nodims(N, N))
ub = UnitfulTensor(b, 𝐌 * nodims(N))
println("Matrix * vector:")
@btime $A * $b
@btime $uA * $ub
println("Matrix * matrix:")
@btime $A * $A
@btime $uA * $uA
println("Solving a linear system:")
@btime $A \ $b
@btime $uA \ $ub
# output
Matrix * vector:
119.563 ns (1 allocation: 80 bytes)
160.231 ns (1 allocation: 80 bytes)
Matrix * matrix:
82.297 ns (1 allocation: 128 bytes)
115.884 ns (1 allocation: 128 bytes)
Solving a linear system:
671.548 ns (3 allocations: 288 bytes)
861.569 ns (5 allocations: 576 bytes)
```

### Laplacian on a 10×10 grid, dense and sparse

```
using UnitfulTensors, LinearAlgebra, SparseArrays, BenchmarkTools
Nx = 10
N = Nx^2
Δ1d = Tridiagonal(fill(1., Nx - 1), fill(-2., Nx), fill(1., Nx - 1))
𝟙 = I(Nx)
Δ = kron(Δ1d, 𝟙) + kron(𝟙, Δ1d)
Δs = sparse(Δ)
ρ = randn(N)
uΔ = UnitfulTensor(Δ, 𝐋^-2 * nodims(N, N))
uΔs = UnitfulTensor(Δs, 𝐋^-2 * nodims(N, N))
uρ = UnitfulTensor(ρ * u"C/m^3")
println("Matrix * vector:")
@btime $Δ * $ρ
@btime $uΔ * $uρ
@btime $Δs * $ρ
@btime $uΔs * $uρ
println("Matrix * matrix:")
@btime $Δ * $Δ
@btime $uΔ * $uΔ
@btime $Δs * $Δs
@btime $uΔs * $uΔs
println("Solving a linear system:")
@btime $Δ \ $ρ
@btime $uΔ \ $uρ
@btime $Δs \ $ρ
@btime $uΔs \ $uρ
# output
Matrix * vector:
1.618 μs (1 allocation: 896 bytes)
2.038 μs (1 allocation: 896 bytes)
755.763 ns (1 allocation: 896 bytes)
1.128 μs (1 allocation: 896 bytes)
Matrix * matrix:
56.211 μs (2 allocations: 78.17 KiB)
57.104 μs (2 allocations: 78.17 KiB)
16.128 μs (6 allocations: 35.50 KiB)
16.492 μs (6 allocations: 35.50 KiB)
Solving a linear system:
150.873 μs (4 allocations: 79.92 KiB)
153.623 μs (6 allocations: 85.67 KiB)
455.481 μs (125 allocations: 253.85 KiB)
455.188 μs (127 allocations: 259.60 KiB)
```

## Brief API overview

Feature requests and other suggestions are welcome!

Pinging those who might be interested: @ChrisRackauckas, @briochemc