Is there a function for creating n (nearly) evenly-spaced integers for indexing?

Hi all,

I’d like to downsample arrays with something like

longarray = rand(100)
numsamples = 30
sampleindices = evenintegers(1, length(longarray), numsamples)
downsampled = longarray[sampleindices]

I realize its not obvious what “even” means here, since it cannot give the same exact difference between the integers, and its not clear if the first and last integer should be included.

What I do now instead is this, which feels like I’m overlooking some built-in functionality.

sampleindices = convert.(
    Int,
    round.(range(1, length(longarray), numsamples), digits=0
)

Can I reduce these last three lines to a single statement using built-in Julia?

Thanks!

Those three lines could be written shortly:

round.(Int, range(1, length(longarray), numsamples))
3 Likes
julia> collect(Iterators.partition(1:37, 5))
8-element Vector{UnitRange{Int64}}:
 1:5
 6:10
 11:15
 16:20
 21:25
 26:30
 31:35
 36:37
round.(Int,LinRange(1,100,30))

shorter but basically the same.

1 Like

For what the OP wants here, I think you’d actually want to write

(first(chunk) for chunk in Iterators.partition(longarray, numsamples))
1 Like

Even better: put this into the array indexing expression so you can use begin and end directly:

A[round.(Int, range(begin, end, numsamples))]
6 Likes

firstindex(xs) and lastindex(xs) can be used instead of 1 and length(xs) to avoid problems with OffsetArrays.

1 Like

Maybe you are looking for chunks from ChunkSplitters.jl package?

The docs are here:

Use it perhaps as (there is much more in the docs):

julia> using ChunkSplitters

julia> map(first∘first, chunks(longarray, 30, :batch))
30-element Vector{Int64}:
  1
  5
  9
 13
 17
 21
 25
 29
 33
 37
  ⋮
 92
 95
 98

might be the goal?

The indices can then be used as:

longarray[map(first∘first, chunks(longarray, 30, :batch))]

This allows a completely independent sample to be produced as:

using ChunkSplitters
using IterTools: nth
longarray[map(Base.Fix2(nth, 2)∘first, chunks(longarray, 30, :batch))]

The replies by @oheil, @rafael.guerra, and @mbauman are nice because they use Base functions and look simple to an occasional programmer, but I guess the answer is effectively no: There’s no (single) function to index my long arrays evenly at n points.

1 Like

what’s wrong with Iterators.partition ? it’s almost exactly what you’re looking for

How so? It’s just breaking the array up into chunks, not sampling it.

Julia trying to spread the simple functions to more packages and not have a Base monolith (AFAIK as I’m not a core-dev).

So the ChunkSplitters, IterTools usage I’ve mentioned looks pretty nice to Julians.

And can be made into a simple one-line function if needed. The name of your choice.

I mean, you can partition eachindex. I guess it’s not an exact match

Another nice combination with ChunkSplitters:

using ChunkSplitters
longarray = rand(100);

indices = first(getchunk(longarray, 1, length(longarray)÷30, :scatter),30)
samples = longarray[indices]

This method is also quite efficient (as the indices returned from getchunk and first are integer ranges).

Sure - I certainly don’t wanna speak for Julians. I’ll speak for my future self ~3 months from now when I next code something and need to down-sample a long array: I’m not gonna remember the composition of first with first, nor what the right parameters were for chunks.

I was hoping for a simple builtin because that’s something I might remember. In Python’s numpy I remember it, because it’s “just” a call to “as type Int” on the linspace.

If you only need one such subsample (and not several), I think this might be good (though not superefficient because of randomness):

using StatsBase
sample(longarray, 30; replace=false)

and as the Romans use to say:

if you haven’t imported StatsBase, you haven’t done anything.

Okay, joke aside, the above is not a downsample, so maybe:

longarray[first(1:length(longarray)÷30:end, 30)]

which is short enough to remember and in Base.

Isn’t just

longarray[begin:30:end]

enough here? (Maybe just adjusting the step if the number of samples is given)

This is also what I expected to see suggested

It doesn’t seem to be enough as OP wants 30 indices nearly evenly distributed, while that will give him only 4.

3 Likes
A[range(start=begin, step=end ÷ 30, length=30)]
1 Like