How can I easily skip indices in an array?

I have arrays with 2 dimensions. Elements in the arrays are only considered valid for certain values on the indices. For example, consider the arrays A and B below:

A = round.(rand(10, 2) .* 100)
A[1:3, 2] .= NaN
B = round.(rand(10, 2) .* 100)
B[1:3, 2] .= NaN

If it helps, you can think of the arrays as recording number of people, where the first index represents age groups, and the second index represents education level (HS and college). Younger individuals, say for indices 1 to 3, are too young to have a college degree, so those entries are illegal.

I have to perform operations on the arrays, say sum or prod. How can I skip the illegal indices for those operations? I can manually keep track of those indices as below

s = sum(A[a, e]*B[a, e] for a in 1:10, e in 1:2 if (e == 1) || ((e == 2) && (a > 3)))
p = prod(A[a, e]*B[a, e] for a in 1:10, e in 1:2 if (e == 1) || ((e == 2) && (a > 3)))

Is there a better, less cumbersome, solution?

julia> sum(x -> isnan(x) ? zero(x) : x, A[i]*B[i] for i in eachindex(A))
65709.0

julia> C = A .* B; sum(@view C[:,1]) + sum(@view C[4:end, 2])
65709.0
2 Likes

The Missing type is useful for denoting missing data. Example:

A = [a ∈ 1:3 && e == 2 ? missing : 100*rand() for a = 1:10, e = 1:2]
B = [a ∈ 1:3 && e == 2 ? missing : 100*rand() for a = 1:10, e = 1:2]
sum(skipmissing(A .* B))
2 Likes

Try also Cartesian Indices:

CI = filter(x -> x ∉ CartesianIndices((1:3,2:2)), CartesianIndices(A))
s = sum(A[I]*B[I] for I in CI)
2 Likes

How can I build the indices on determined sizes of arrays? For example, I defined A above to be 10-by-2, if I have variables like x = 10, y = 2, how can I build CI using those instead?

Not sure if this is what you ask for:

CI = filter(x -> x ∉ CartesianIndices((1:3,2:2)), CartesianIndices((1:x,1:y)))
1 Like

Can I use that CartesianIndices object to iterate inside a sum? something like

sum(a*e for (a, e) in CI)

is it possible?

Please check this other post.

got it working like this:

sum( *(Tuple(i)...) for i in CI)
1 Like

Not on the computer, but this doesn’t work?

sum(c[1]*c[2] for c in CI)
1 Like

It does! I didn’t think about that way.

1 Like

How can I streamline this?

using StatsBase

L = rand(2, 9, 2, 2, 5)
idx0 = sample(0:1, ProbabilityWeights([0.4, 0.6]), size(L))
L .= L .* idx0

skipzeros = filter(x -> x ∉ findall(L .== 0), CartesianIndices((1:2, 1:9, 1:2, 1:2, 1:5)))

gaek = ones((2, 9, 2, 5))

for idx in skipzeros
    g, a, e, _, k = Tuple(idx)
    t1 = L[g, a, e, 1, k]
    t2 = L[g, a, e, 2, k]
    gaek[g, a, e, k] = t1 + t2
end

gek = ones((2, 2, 5))

for idx in skipzeros
    g, _, e, _, k = Tuple(idx)
    tt1 = gaek[1, :, 1, k]
    tt2 = gaek[2, :, 1, k]
    gek[g, e, k] = sum(tt1 + tt2)
end

The problem is that the second loop is too wasteful in the sense that I am assigning the same tt1, tt2 and gek several times. Perhaps more specific questions would be: how can I “extract” the g, e, and k components from skipzeros?

You may use getindex(), broadcasted for that purpose:

g = getindex.(skipzeros, 1)
e = getindex.(skipzeros, 3)
k = getindex.(skipzeros, 5)
1 Like

I was hoping to get another collection of CartesianIndices. Is it possible?

And how can I access all indices along a dimension within the scope of skipzeros? For example gaek[1, :, 1, k], but that : would be among all the indices in the second dimension that are present in skipzeros.

The following comprehension works but there should be a simpler way:

CI = [CartesianIndex(i[1],i[3],i[5]) for i in skipzeros]
1 Like

I guess I can also create a generator:

CI = (CartesianIndex(i[1],i[3],i[5]) for i in skipzeros)

any ideas about the other question?