[ANN] IndexFunArrays.jl - Fun with indices (and functions on them)

roflmaostc · March 24, 2021, 11:54am

Hey,

You can try it with:

julia> ] add IndexFunArrays

Initially we wanted a tool calculating output arrays using the index position. For that purpose we created a IndexFunArray (with size independent constant memory allocation) which calculates the value once accessed with getindex. It needs an index function and a output size:

julia> z = IndexFunArray(x -> sum(abs2.(x)), (3, 3))   # directly using the constructor and supplying a function to store in the array
  3×3 IndexFunArray{Int64, 2, var"#184#185"}:
    2   5  10
    5   8  13
   10  13  18

julia> typeof(z) <: AbstractArray
true

julia> z[3,3] # use it like a normal array which then calculates the value
18

For convenience we already created several distance related wrapper and some window functions:

julia> rr2((4, 4))
4×4 IndexFunArray{Float64, 2, IndexFunArrays.var"#4#5"{Float64, Tuple{Float64, Float64}, Tuple{Int64, Int64}}}:
 8.0  5.0  4.0  5.0
 5.0  2.0  1.0  2.0
 4.0  1.0  0.0  1.0
 5.0  2.0  1.0  2.0

julia> rr2((3,3), offset=CtrCorner)
  3×3 IndexFunArray{Float64, 2, IndexFunArrays.var"#4#5"{Float64, Tuple{Float64, Float64}, Tuple{Int64, Int64}}}:
   0.0  1.0  4.0
   1.0  2.0  5.0
   4.0  5.0  8.0

julia> rr2((3,3), offset=(4,4))
3×3 IndexFunArray{Float64, 2, IndexFunArrays.var"#4#5"{Float64, Tuple{Int64, Int64}, Tuple{Int64, Int64}}}:
 18.0  13.0  10.0
 13.0   8.0   5.0
 10.0   5.0   2.0

The package works with other packages like LazyArrays or CUDA.
We were curious if there isn’t something similar already? Maybe some of you know more about.

We would be happy about feedback!

Thanks,

Felix

sijo · March 24, 2021, 1:17pm

Interesting!

If #37648 gets merged you could do the following in plain Julia:

julia> z = (sum(abs2.((i,j))) for i in 1:3, j in 1:3);

julia> z[3,3]
18

or

julia> z = (sum(abs2.(Tuple(x))) for x in CartesianIndices((3, 3)));

julia> z[3,3]
18

roflmaostc · March 24, 2021, 1:22pm

That’s cool and would it have been present, we probably didn’t start this package

But the important part for us are the convenient wrappers. We might remove the IndexFunArray and switch to the built-in one in the future.

rafael.guerra · March 24, 2021, 1:34pm

Thanks for sharing.
Wondering if using the map function and CartesianIndices one couldn’t do some of this nice stuff?

For instance, your first example could be done as:

ix(n,m) = map(i-> [i[1],i[2]], CartesianIndices((1:n,1:m)))

map( x -> sum(abs2.(x)), ix(3,3) )

3×3 Matrix{Int64}:
  2   5  10
  5   8  13
 10  13  18

roflmaostc · March 24, 2021, 1:37pm

Yes, but mapping allocates memory.
One then could use MappedArrays.jl but which does need an beforehand allocated array as input (which IndexFunArray does not need).

logankilpatrick · March 24, 2021, 11:20pm

Accidentally converted this to a PM on mobile, sorry about that!

affans · March 25, 2021, 6:23am

Cool package. I was reading over the src code and have a quick question: The definition

struct IndexFunArray{T, N, F} <: AbstractArray{T, N} where {F}

is parametrized by types T, N, F so shouldn’t it be where {T, N, F} at the end of the line?

Another question I had was that the some functions have a signature where the argument is IndexFunArray{T,N}, e.g.

Base.getindex(A::IndexFunArray{T,N}, I::Vararg{Int, N}) where {T,N} = return A.generator(I)

but isn’t the object actually IndexFunArray{T,N,F} so how come multiple dispatch gets it correct?

roflmaostc · March 25, 2021, 8:42am

That’s I good question I don’t know either.

But we can try the following:

julia> struct Foo{T, N, F} <: AbstractArray{T, N} where {T, N, F}
           bar::Int
       end
ERROR: invalid subtyping in definition of Foo
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> struct Foo{T, N, F} <: AbstractArray{T, N} where {F}
           bar::Int
       end

So the first one is not possible. My interpretation would be that the T,N are already fixed by the AbtractArray{T, N} and therefore are not free anymore.
In the documentation I could find this part about Free variables.

And regarding the second one:

julia> AbstractArray{Int}
AbstractArray{Int64, N} where N

julia> IndexFunArray{Int}
IndexFunArray{Int64, N, F} where F where N

julia> IndexFunArray{Int} == IndexFunArray{Int64, N, F} where F where N
true

julia> IndexFunArray{Int} === IndexFunArray{Int64, N, F} where F where N
true

So the notation without the trailing types might be a syntactic sugar for the full notation?

carstenbauer · March 25, 2021, 8:44am

You’re essentially right about both things

Having said that, let me add that, in some sense, struct signatures differ from function signatures here a bit

julia> struct MyStruct{C<:AbstractArray{T}} where T end
ERROR: syntax: invalid type signature around REPL[4]:1
Stacktrace:
 [1] top-level scope at REPL[4]:1

julia> f(x::AbstractArray{T}) where T = "whatever" # works
f (generic function with 1 method)

julia> struct MyStruct{C<:AbstractArray{T} where T} end # works

julia> struct MyStruct{T, C<:AbstractArray{T}} end # works

carstenbauer · March 25, 2021, 9:00am

julia> struct MyStruct{T<:Real} end # does what you want

julia> struct MyStruct{T} where {T<:Real} end # doesn't work
ERROR: syntax: invalid type signature around REPL[3]:1
Stacktrace:
 [1] top-level scope at REPL[3]:1

julia> struct MyStruct{T} <: Any where {T<:Real} end # doesn't do what you want

julia> MyStruct{ComplexF64}
MyStruct{Complex{Float64}}

Interesting

rafael.guerra · March 25, 2021, 11:29am

@roflmaostc, thanks for the explanation, it is clear.
Your code is very clever and too advanced for my current level but it is impressive to see that the output views not only do not allocate but their computing time does not depend on n in the example below?! How come? Thanks in advance for shedding some light.

PS: I have compared it against a simple solution that broadcasts over CartesianIndices but that runs out of computer memory for large arrays. Any expert advice that may improve this situation would be appreciated too.

using IndexFunArrays, BenchmarkTools

fci(f,n,m) = f.(Tuple.(CartesianIndices((1:n,1:m)))) #broadcasting over CartesianIndices

f(x) = sum(abs2.(x))

n = 3; m = 3;
@btime fci($f,$n,$m)  # 203.833 ns (3 allocations: 256 bytes)
@btime IndexFunArray(f, ($n, $m)) # 76.386 ns (2 allocations: 64 bytes)

n = 10_000_000; m = 10_000_000;
@btime fci($f,$n,$m)  # ERROR: OutOfMemoryError()
@btime IndexFunArray(f, ($n, $m)) # 79.257 ns (2 allocations: 64 bytes)

roflmaostc · March 25, 2021, 11:38am

You run out of memory because your allocated array with CartesianIndices has (10_000_000)^2 entries.

Our object is an array in some sense. When you use arr[2, 4] with a normal array, it fetches the entry from the memory. But with the IndexFunArray we really overwrite the arr[2, 4] to not fetch the value from the memory but instead calculate it when accessed (from the generator function and the indices [2,4]). So we do not allocate memory beforehand.

Here you can see that getindex calculate the entry value from the index position I:

Base.getindex(A::IndexFunArray{T,N}, I::Vararg{Int, N}) where {T,N} = return A.generator(I)

Edit:
Maybe this minimal example helps. The documentation for defining own array types is here. Below an array returning 0 or 42 randomly. It can be of arbitrary size:

julia> struct RandArray{T, N} <: AbstractArray{T, N}
           size::NTuple{N, Int}
       end

julia> Base.getindex(A::RandArray{T,N}, I::Vararg{Int, N}) where {T,N} = rand((0, 42))

julia> Base.size(A::RandArray) = A.size

julia> x = RandArray{Int, 2}((1000000000000, 1000000000000));

julia> x[1,1]
0

julia> x[1,1]
42

julia> x[1,2]
0

stevengj · March 25, 2021, 12:15pm

This seems equivalent to

julia> z = mappedarray(x -> sum(abs2.(Tuple(x))), CartesianIndices((3,3)))
3×3 mappedarray(var"#3#4"(), ::CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}) with eltype Int64:
  2   5  10
  5   8  13
 10  13  18

julia> z[3,3]
18

If you want similar syntax to your IndexFunArray, you could define:

indexfunarray(f, dims) = mappedarray(x -> f(Tuple(x)), CartesianIndices(dims))

CartesianIndices is like a range object—it doesn’t allocate.

roflmaostc · March 25, 2021, 1:05pm

That makes totally sense, thanks for pointing that out!

We probably fix that in the future and get rid of IndexFunArray type completely

cgeoga · March 25, 2021, 1:09pm

Another similar package is KernelMatrices.jl. Not exactly the same, because it’s really designed for passing in something like pts::Vector{SVector{D,T}} and then working lazily with arrays A[j,k] = K(pts[j], pts[k], ...) in an efficient way. So not O(1) allocations like it looks like this is designed for, but certainly similar. As a disclaimer, this is my project.

rafael.guerra · March 25, 2021, 1:30pm

@stevengj, your solution seems to be even faster. What is the right way to benchmark it?

PS: tried it below but the @btime result does not make sense to me (too many floating point operations in no time)…

using BenchmarkTools

n = 10_000_000; m = 10_000_000;

using IndexFunArrays
@btime IndexFunArray($f, ($n, $m)) # 76.36 ns (2 allocations: 64 bytes)

# solution by @stevengj :
using MappedArrays
indexfunarray(f, dims) = mappedarray(x -> f(Tuple(x)), CartesianIndices(dims))

@btime indexfunarray($f, ($n,$m))  # 0.001 ns (0 allocations: 0 bytes)

roflmaostc · March 25, 2021, 1:40pm

The reason is, that you call a wrapper to create the IndexFunArray.
If you remove that wrapper, IndexFunArray has roughly the same.

@rafael.guerra keep in mind, that you don’t calculate anything yet. You just store the information how you can calculate it when accessing it via getindex or [1,2].

You probably want to check something like sin.(indexfunarray($f, ($n,$m)) which then really operates on all array elements, but performance is roughly the same for both methods since they do roughly the same. (in your case, the arrays are way too large to calculate and allocate sin.(...) explicitly)

rafael.guerra · March 25, 2021, 1:53pm

@roflmaostc, thanks a lot for the explanation, which together with the MappedArrays.jl ReadMe file, helped me understand this a bit better.
The fact that some mapped array values are displayed in the terminal for QC, made me believe that all values were computed, which is not the case.

Topic		Replies	Views
Indexing on a pseudo-dimension General Usage indexing , arrays	10	330	July 4, 2023
Functors instead of arrays, matrices etc Performance question , memory-allocation	22	743	January 30, 2024
Getting CartesianIndices to propagate Numerics metaprogramming , indexing , arrays	8	549	March 20, 2021
Iterating over 2D array and placing results in 1D array with the same number of elements New to Julia indexing	20	2164	July 3, 2020
Crazy allocations using CartesianIndices General Usage question , indexing , memory-allocation	14	1454	March 29, 2021

[ANN] IndexFunArrays.jl - Fun with indices (and functions on them)

Related topics