Special purpose subtypes (of arrays)

array
type

#1

Hey there,
I am trying to leverage Julia’s type system to get as many compile time protections as I can while still being performant. I’m using static arrays of reals for a project. Say I have a function f1 which can take an array a, and perform some operation on it once a has been sorted. Is it possible to create a subtype of Array which represents a sorted array? Then f1 in use might look like

function sorted(a :: Array{Real}) :: SortedArray{Real}
...
end

function f1(a :: SortedArray{Real})
...
end

# so that this fails since it's not guaranteed sorted
f1(a)
# but this works
f1(sorted(a))

In a similar spirit, I am working with 3D vectors represented like SVector{3, N} where N, it’d be awesome if I could define functions which only accepted normalized vectors (you can imagine the analogy in code).

For both cases, I could probably write a struct/wrapper around the array, but that brings two problems for me:

  1. Will there be a performance hit?
  2. Is there a neat way to ensure all functions which worked with the original type also work for the new type?

So, can I create a direct subtype of Array without changing the data representation underneath? Thanks for bearing with, I’m still learning Julia’s type system.

Cheers,
Alex


#2

I have an (unregistered) package

for this purpose.

If they original functions work for an <: AbstractArray via its interface, they should keep working.


#3

Right on @Tamas_Papp that’s pretty nearly exactly what I was looking for! Thanks!

I notice that the implementation does put the array containing your data inside a struct. I was interested to see what the impact would be on performance so I wrote the following test which measures time it takes to operate/allocate them (so the actual sort time is excluded). For very small collections (size 2-4), it’s at least 4x slower, and for larger ones (size 1024) the performance is closer (30% slower).

using SortedVectors

for log_size in 1:10
    size = 1 << log_size
    println("size = ", size)
    # standard vectors
    collection_1 :: Vector{Vector{Float64}} = []
    for i in 1: size
        push!(collection_1, Vector(rand(size)))
    end
    @time reduce(+, collection_1)

    # SortedVectors
    collection_2 :: Vector{SortedVector{Float64}} = []
    for i in 1: size
        push!(collection_2, SortedVector(SortedVectors.AssumeSorted(), <, sort!(rand(size))))
    end
    @time reduce(+, collection_2)
    println()
end

Results:

size = 2
  0.000000 seconds (1 allocation: 96 bytes)
  0.000003 seconds (1 allocation: 96 bytes)

size = 4
  0.000000 seconds (3 allocations: 336 bytes)
  0.000002 seconds (3 allocations: 336 bytes)

size = 8
  0.000001 seconds (7 allocations: 1008 bytes)
  0.000004 seconds (7 allocations: 1008 bytes)

size = 16
  0.000002 seconds (15 allocations: 3.047 KiB)
  0.000006 seconds (15 allocations: 3.047 KiB)

size = 32
  0.000004 seconds (31 allocations: 12.109 KiB)
  0.000009 seconds (31 allocations: 12.109 KiB)

size = 64
  0.000007 seconds (63 allocations: 38.391 KiB)
  0.000020 seconds (63 allocations: 38.391 KiB)

size = 128
  0.000027 seconds (127 allocations: 144.859 KiB)
  0.000051 seconds (127 allocations: 144.859 KiB)

size = 256
  0.000116 seconds (255 allocations: 541.875 KiB)
  0.000207 seconds (255 allocations: 541.875 KiB)

size = 512
  0.000529 seconds (511 allocations: 2.058 MiB)
  0.000454 seconds (511 allocations: 2.058 MiB)

size = 1024
  0.003511 seconds (1.02 k allocations: 8.117 MiB, 55.97% gc time)
  0.004643 seconds (1.02 k allocations: 8.117 MiB, 57.17% gc time)

With that in mind, is there a way to achieve this such a subtyping without a container? Trying to work with normalized 3D vectors with an analogous implementation would be a pretty big penalty.


#4

These benchmarks aren’t particularly precise, since you’re only running each function once. If you want meaningful results, try using https://github.com/JuliaCI/BenchmarkTools.jl and running @benchmark reduce(+, $collection_1), etc.

But to answer your last question, no: you cannot subtype a concrete type in Julia. Fortunately, the cost of a container should be infinitesimal or even zero in many cases, so I would try benchmarking more thoroughly before you give up on encapsulation.

Finally, if you actually have 3D vectors, you should definitely check out https://github.com/JuliaArrays/StaticArrays.jl . The performance benefit of using SVectors from StaticArrays can be immense, and they are ideal for representing 3D vectors.


#5

@rdeits thanks for pointing out the better instrumentation - @btime did confirm a performance gap in this case.

As for using containers I’ll play around a bit more - whatever I make will definitely be based on StaticArrays, they clearly are the right tool for the job.