Zero-based indexing ... (not) again

bertschi · June 28, 2023, 7:43pm

Just realized that zero-based, i.e., offset-based indexing, can easily be done in Julia already:

v[1] == v[begin + 0]

For fun and the sake of it, here is a macro that does that translation for you

using MacroTools

function replace(access_form)
    if access_form == :(:)
        access_form
    else
        :($(:begin) .+ ($access_form))
    end
end

macro absolutezero(expr)
    esc(MacroTools.postwalk(expr) do x
            @capture(x, v_[idx__]) || return x
            :($(v)[$(map(replace, idx)...)])
        end)
end

using Test

@testset "soo cool" begin
    m = rand(2, 3)
    @test @absolutezero(m[0, :]) == m[1, :]
    @test @absolutezero(m[1, 1:2]) == m[2, 2:3]
    @test @absolutezero(sqrt(2*m[1,1] + m[0, 0])) == sqrt(2*m[2,2] + m[1,1])
    v = [1, 2, 3]
    @test @absolutezero(sum(v[i] for i in 0:(length(v)-1))) == sum(v)
end

frylock · June 28, 2023, 7:48pm

I feel that with a macro name like @absolutezero, that it should integrate with Unitful.jl? , that way you can have @absolutezero have 1 as your absolute in Julia units, and 0 in C units?

mkitti · June 28, 2023, 9:57pm

Sure. You could also just use OffsetArrays.

julia> using OffsetArrays

julia> m = rand(2,3)
2×3 Matrix{Float64}:
 0.779723  0.904961  0.121745
 0.728104  0.866568  0.99383

julia> o = OffsetArray(m, (-1,-1))
2×3 OffsetArray(::Matrix{Float64}, 0:1, 0:2) with eltype Float64 with indices 0:1×0:2:
 0.779723  0.904961  0.121745
 0.728104  0.866568  0.99383

julia> m[1,1] == o[0,0]
true

julia> o[0,0]
0.779722860775983

bertschi · June 28, 2023, 10:47pm

Right, but then I have to remember which array supports which indices. Offsetting from begin (should) actually work for any array:

julia> m = rand(2, 3);

julia> o₀ = OffsetArray(m, (-1, -1))
2×3 OffsetArray(::Matrix{Float64}, 0:1, 0:2) with eltype Float64 with indices 0:1×0:2:
 0.332088  0.46447   0.921899
 0.344263  0.580052  0.244817

julia> oₙ = OffsetArray(m, (2, 1))
2×3 OffsetArray(::Matrix{Float64}, 3:4, 2:4) with eltype Float64 with indices 3:4×2:4:
 0.332088  0.46447   0.921899
 0.344263  0.580052  0.244817

julia> (m[1, 1], o₀[0, 0], oₙ[3, 2])
(0.3320875879515006, 0.3320875879515006, 0.3320875879515006)

julia> @absolutezero (m[0, 0], o₀[0, 0], oₙ[0, 0])
(0.3320875879515006, 0.3320875879515006, 0.3320875879515006)

StevenSiew · June 28, 2023, 11:29pm

Does this mean that we can copy algorithms directly from C and Python without changing the indexing?

Benny · June 29, 2023, 1:22am

I’d rather use OffsetArrays to guarantee I’m using 0-based arrays than alter my indexing.
Arrays with different indices often have different indices for a reason, so when I have to work with those I wouldn’t usually write indices that disregard that. That’s not really a dealbreaker though because when I need to disregard them, I would use begin/end, and using @absolutezero is intended for the same thing with 0/whatever. The bigger problem is that this doesn’t cover a lot of expressions I could put in the brackets. Here’s a simple one:

julia> x = 1:9
1:9

julia> x[begin] == x[begin .+ 0] # @absolutezero x[0]
true

julia> x[:] == x[:] # @absolutezero x[:]
true

julia> x[begin] == x[begin .+ begin] # @absolutezero x[begin]
false

begin stands in for the method call firstindex(x), and I could be calling much more complicated methods to find an index. begin .+ some_method(x, inputs...) isn’t always what I want (for example, if it returns :), and it might not even be possible if addition is not defined for the indices type’s values.

StatisticalMouse · June 29, 2023, 7:34am

Absolute zero should start indexing at index -273.

bertschi · June 29, 2023, 5:24pm

Good points, the macro is indeed much more brittle than I had hoped for. Here are some more issues:

julia> @absolutezero x[0:end]
ERROR: BoundsError: attempt to access 9-element UnitRange{Int64} at index [1:10]

julia> @absolutezero @views x[0]
1

julia> @views @absolutezero x[0]
ERROR: BoundsError: attempt to access 9-element UnitRange{Int64} at index [0]

Seems that changing indexing should be used sparingly, i.e., either locally as @absolutezero tried or more globally as OffsetArrays. Imho the latter is actually worse as it might break in generic code without any hint – lexical, visual or otherwise – of where the problem is lurking.

bertschi · June 29, 2023, 5:26pm

Yes, but would require non-integer based indexing to be absolutely correct.

Benny · June 29, 2023, 8:40pm

Good demo of macros expanding from left to right, something like this should be more obvious in the docs.

You can work with it locally, just use the OffsetArray for specific contexts and use the parent array elsewhere.

nsajko · June 29, 2023, 9:12pm

I usually access arrays like that. Another benefit of this method, apart from achieving zero-based indexing (on the “outside”), is being independent of the indexing base of the array object.

You could perhaps use firstindex to implement absolutezero, so it wouldn’t need to be a macro. Or to simplify the implementation of the macro, at least.

linwaytin · June 30, 2023, 2:07pm

Good to know this. Thanks for sharing.

I have a question about accessing with A[begin+0]. Is this slower than A[1], because the apparent addition? Or it’s the same after compilation?

mkitti · June 30, 2023, 2:28pm

It’s the same after compilation.

julia> const A = rand(5)
5-element Vector{Float64}:
 0.06668256482490609
 0.4337214930025368
 0.5233519384477602
 0.542584130012201
 0.5232660469641692

julia> f(A) = A[1]
f (generic function with 1 method)

julia> g(A) = A[begin+0]
g (generic function with 1 method)

julia> @code_typed f(A)
CodeInfo(
1 ─ %1 = Base.arrayref(true, A, 1)::Float64
└──      return %1
) => Float64

julia> @code_typed g(A)
CodeInfo(
1 ─ %1 = Base.arrayref(true, A, 1)::Float64
└──      return %1
) => Float64

julia> using BenchmarkTools

julia> @btime f($A)
  2.544 ns (0 allocations: 0 bytes)
0.06668256482490609

julia> @btime g($A)
  2.579 ns (0 allocations: 0 bytes)
0.06668256482490609

aplavin · June 30, 2023, 3:26pm

It’s also important not to lose the original point of why people use offsetarrays: exactly to change the meaning of index access. If all packages and functions used a[begin + i - 1] instead of a[i], there would be no point of changing the index offset at all.

Elrod · June 30, 2023, 5:41pm

Note that if you hide the index from the compiler, begin+ is actually faster.
Computers are 0-based, so every 1-based index in Julia has an implicit subtraction
The apparent addition of begin+ actually just cancels that addition.

julia> using StrideArraysCore, OffsetArrays

julia> A = rand(2,2,2,2,2);

julia> As = StrideArraysCore.zero_offsets(StrideArray(A));

julia> Ao = OffsetArrays.Origin(0)(A);

julia> foo(A,i,j,k,l,m) = @inbounds A[i,j,k,l,m]
foo (generic function with 1 method)

julia> bar(A,i,j,k,l,m) = @inbounds A[begin+i,begin+j,begin+k,begin+l,begin+m]
bar (generic function with 1 method)

julia> @cn foo(A,1,1,1,1,1)
julia> @code_native syntax=:intel debuginfo=:none foo(A, 1, 1, 1, 1, 1)
        .text
        .file   "foo"
        .globl  julia_foo_469                   # -- Begin function julia_foo_469
        .p2align        4, 0x90
        .type   julia_foo_469,@function
julia_foo_469:                          # @julia_foo_469
# %bb.0:                                # %top
        push    rbp
        mov     rbp, rsp
        mov     r10, qword ptr [rdi]
        mov     r11, qword ptr [rdi + 24]
        dec     rdx
        imul    rdx, r11
        imul    r11, qword ptr [rdi + 32]
        dec     r9
        imul    r9, qword ptr [rdi + 48]
        lea     rax, [r8 + r9]
        dec     rax
        imul    rax, qword ptr [rdi + 40]
        add     rax, rcx
        dec     rax
        imul    rax, r11
        add     rdx, rsi
        add     rdx, rax
        vmovsd  xmm0, qword ptr [r10 + 8*rdx - 8] # xmm0 = mem[0],zero
        pop     rbp
        ret
.Lfunc_end0:
        .size   julia_foo_469, .Lfunc_end0-julia_foo_469
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> @cn bar(A,1,1,1,1,1)
julia> @code_native syntax=:intel debuginfo=:none bar(A, 1, 1, 1, 1, 1)
        .text
        .file   "bar"
        .globl  julia_bar_520                   # -- Begin function julia_bar_520
        .p2align        4, 0x90
        .type   julia_bar_520,@function
julia_bar_520:                          # @julia_bar_520
# %bb.0:                                # %top
        push    rbp
        mov     rbp, rsp
        mov     rax, qword ptr [rdi + 24]
        imul    rdx, rax
        imul    rax, qword ptr [rdi + 32]
        mov     r10, qword ptr [rdi]
        imul    r9, qword ptr [rdi + 48]
        add     r9, r8
        imul    r9, qword ptr [rdi + 40]
        add     r9, rcx
        imul    r9, rax
        add     rdx, rsi
        add     rdx, r9
        vmovsd  xmm0, qword ptr [r10 + 8*rdx]   # xmm0 = mem[0],zero
        pop     rbp
        ret
.Lfunc_end0:
        .size   julia_bar_520, .Lfunc_end0-julia_bar_520
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> @cn foo(As, 0, 0, 0, 0, 0)
julia> @code_native syntax=:intel debuginfo=:none foo(As, 0, 0, 0, 0, 0)
        .text
        .file   "foo"
        .globl  julia_foo_535                   # -- Begin function julia_foo_535
        .p2align        4, 0x90
        .type   julia_foo_535,@function
julia_foo_535:                          # @julia_foo_535
# %bb.0:                                # %top
        push    rbp
        mov     rbp, rsp
        imul    r9, qword ptr [rdi + 32]
        add     r9, r8
        imul    r9, qword ptr [rdi + 24]
        add     r9, rcx
        imul    r9, qword ptr [rdi + 16]
        add     r9, rdx
        imul    r9, qword ptr [rdi + 8]
        add     r9, rsi
        mov     rax, qword ptr [rdi]
        vmovsd  xmm0, qword ptr [rax + 8*r9]    # xmm0 = mem[0],zero
        pop     rbp
        ret
.Lfunc_end0:
        .size   julia_foo_535, .Lfunc_end0-julia_foo_535
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> @cn foo(Ao, 0, 0, 0, 0, 0)
julia> @code_native syntax=:intel debuginfo=:none foo(Ao, 0, 0, 0, 0, 0)
        .text
        .file   "foo"
        .globl  julia_foo_537                   # -- Begin function julia_foo_537
        .p2align        4, 0x90
        .type   julia_foo_537,@function
julia_foo_537:                          # @julia_foo_537
# %bb.0:                                # %top
        push    rbp
        mov     rbp, rsp
        push    r15
        push    r14
        push    r12
        push    rbx
        mov     r11, qword ptr [rdi + 8]
        mov     r10, qword ptr [rdi + 16]
        mov     r12, qword ptr [rdi + 24]
        mov     rbx, qword ptr [rdi + 32]
        mov     rax, qword ptr [rdi + 40]
        mov     rdi, qword ptr [rdi]
        not     r11
        mov     r14, qword ptr [rdi]
        mov     r15, qword ptr [rdi + 24]
        not     r10
        add     r10, rdx
        imul    r10, r15
        imul    r15, qword ptr [rdi + 32]
        not     r12
        add     r12, rcx
        not     rbx
        add     rbx, r8
        not     rax
        add     rax, r9
        imul    rax, qword ptr [rdi + 48]
        add     rax, rbx
        imul    rax, qword ptr [rdi + 40]
        add     rax, r12
        imul    rax, r15
        add     r11, rsi
        add     r11, r10
        add     r11, rax
        vmovsd  xmm0, qword ptr [r14 + 8*r11]   # xmm0 = mem[0],zero
        pop     rbx
        pop     r12
        pop     r14
        pop     r15
        pop     rbp
        ret
.Lfunc_end0:
        .size   julia_foo_537, .Lfunc_end0-julia_foo_537
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

Note all the dec instructions when using normal, 1-based, indexing that disappear with begin+.

Note also that OffsetArrays keeps the offsets dynamic, so you will actually get extra additions there – that’s why we have even more than regular arrays.
StrideArraysCore.jl allows for both compile-time known offsets and runtime offsets. So here I use compile time offsets to show it has relatively few indices.
Note that I optimized the amount of instructions in an indexing operation like this in StrideArraysCore 0.4.16 (released less than an hour ago); 0.4.15 and older will have a lot more instructions (but do still support compile time and runtime offsets).

Note that the advantage of runtime offsets is avoiding code duplication; you don’t need to compile different almost-the-same version of a function just for changing a few of these values.

In any loop, the optimizer should be able to hoist the offseting out. Meaning this is a cost you’re only really paying once per non-inlined function call. That is, it is extremely cheap.
On the other hand, compile times can be arbitrarily expensive.
Dynamic offsets are a good choice if you’re likely to use more than one offset value. If not, then of course fixing it at compile time is a little nicer.

mkitti · July 3, 2023, 10:56pm

@Elrod , do you think a package containing an array type dedicated to just 0-based indexing would have benefits?

Elrod · July 4, 2023, 12:25am

I personally prefer flatter array types, so I think StrideArraysCore.jl is an okay approach.
Although perhaps zero indexed can be given more support.

sijo · July 4, 2023, 8:14am

From what I understand however there’s no penalty from 1-based indexing for 1D arrays:

julia> A = rand(2);

julia> foo(A,i) = @inbounds A[i];

julia> bar(A,i) = @inbounds A[begin+i];

julia> @code_native syntax=:intel debuginfo=:none foo(A, 1)
	.text
	.file	"foo"
	.globl	julia_foo_189                   # -- Begin function julia_foo_189
	.p2align	4, 0x90
	.type	julia_foo_189,@function
julia_foo_189:                          # @julia_foo_189
	.cfi_startproc
# %bb.0:                                # %top
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset rbp, -16
	mov	rbp, rsp
	.cfi_def_cfa_register rbp
	mov	rax, qword ptr [rdi]
	vmovsd	xmm0, qword ptr [rax + 8*rsi - 8] # xmm0 = mem[0],zero
	pop	rbp
	.cfi_def_cfa rsp, 8
	ret
.Lfunc_end0:
	.size	julia_foo_189, .Lfunc_end0-julia_foo_189
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

julia> @code_native syntax=:intel debuginfo=:none bar(A, 1)
	.text
	.file	"bar"
	.globl	julia_bar_220                   # -- Begin function julia_bar_220
	.p2align	4, 0x90
	.type	julia_bar_220,@function
julia_bar_220:                          # @julia_bar_220
	.cfi_startproc
# %bb.0:                                # %top
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset rbp, -16
	mov	rbp, rsp
	.cfi_def_cfa_register rbp
	mov	rax, qword ptr [rdi]
	vmovsd	xmm0, qword ptr [rax + 8*rsi]   # xmm0 = mem[0],zero
	pop	rbp
	.cfi_def_cfa rsp, 8
	ret
.Lfunc_end0:
	.size	julia_bar_220, .Lfunc_end0-julia_bar_220
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

nsajko · July 4, 2023, 11:55am

1-based index:

0-based index:

So I guess that your point is that, due to the addressing mode, 1-based indexing still ends up as a single instruction? But that doesn’t mean the performance is the same, for example: the 0-based indexing instruction is one byte shorter (c5 fb 10 44 f0 f8 vs c5 fb 10 04 f0), which could mean the pressure on the instruction cache is relaxed, for example.

sijo · July 4, 2023, 12:12pm

Fair enough (yes that was my point).