parb
January 12, 2024, 8:07pm
1
Hi all,
I’d like to downsample arrays with something like
longarray = rand(100)
numsamples = 30
sampleindices = evenintegers(1, length(longarray), numsamples)
downsampled = longarray[sampleindices]
I realize its not obvious what “even” means here, since it cannot give the same exact difference between the integers, and its not clear if the first and last integer should be included.
What I do now instead is this, which feels like I’m overlooking some built-in functionality.
sampleindices = convert.(
Int,
round.(range(1, length(longarray), numsamples), digits=0
)
Can I reduce these last three lines to a single statement using built-in Julia?
Thanks!
Those three lines could be written shortly:
round.(Int, range(1, length(longarray), numsamples))
3 Likes
julia> collect(Iterators.partition(1:37, 5))
8-element Vector{UnitRange{Int64}}:
1:5
6:10
11:15
16:20
21:25
26:30
31:35
36:37
oheil
January 12, 2024, 8:33pm
4
round.(Int,LinRange(1,100,30))
shorter but basically the same.
1 Like
Mason
January 12, 2024, 8:43pm
5
For what the OP wants here, I think you’d actually want to write
(first(chunk) for chunk in Iterators.partition(longarray, numsamples))
1 Like
Even better: put this into the array indexing expression so you can use begin
and end
directly:
A[round.(Int, range(begin, end, numsamples))]
6 Likes
jar1
January 12, 2024, 8:52pm
7
firstindex(xs)
and lastindex(xs)
can be used instead of 1
and length(xs)
to avoid problems with OffsetArrays.
1 Like
Dan
January 12, 2024, 8:59pm
8
Maybe you are looking for chunks
from ChunkSplitters.jl package?
The docs are here:
Use it perhaps as (there is much more in the docs):
julia> using ChunkSplitters
julia> map(first∘first, chunks(longarray, 30, :batch))
30-element Vector{Int64}:
1
5
9
13
17
21
25
29
33
37
⋮
92
95
98
might be the goal?
The indices can then be used as:
longarray[map(first∘first, chunks(longarray, 30, :batch))]
This allows a completely independent sample to be produced as:
using ChunkSplitters
using IterTools: nth
longarray[map(Base.Fix2(nth, 2)∘first, chunks(longarray, 30, :batch))]
parb
January 12, 2024, 9:08pm
9
The replies by @oheil , @rafael.guerra , and @mbauman are nice because they use Base functions and look simple to an occasional programmer, but I guess the answer is effectively no: There’s no (single) function to index my long arrays evenly at n points.
1 Like
adienes
January 12, 2024, 9:09pm
10
what’s wrong with Iterators.partition
? it’s almost exactly what you’re looking for
Mason
January 12, 2024, 9:13pm
11
How so? It’s just breaking the array up into chunks, not sampling it.
Dan
January 12, 2024, 9:14pm
12
Julia trying to spread the simple functions to more packages and not have a Base monolith (AFAIK as I’m not a core-dev).
So the ChunkSplitters, IterTools usage I’ve mentioned looks pretty nice to Julians.
And can be made into a simple one-line function if needed. The name of your choice.
adienes
January 12, 2024, 9:22pm
13
I mean, you can partition eachindex
. I guess it’s not an exact match
Dan
January 12, 2024, 9:29pm
14
Another nice combination with ChunkSplitters:
using ChunkSplitters
longarray = rand(100);
indices = first(getchunk(longarray, 1, length(longarray)÷30, :scatter),30)
samples = longarray[indices]
This method is also quite efficient (as the indices returned from getchunk
and first
are integer ranges).
parb
January 12, 2024, 9:30pm
15
Sure - I certainly don’t wanna speak for Julians. I’ll speak for my future self ~3 months from now when I next code something and need to down-sample a long array: I’m not gonna remember the composition of first
with first
, nor what the right parameters were for chunks
.
I was hoping for a simple builtin because that’s something I might remember. In Python’s numpy I remember it, because it’s “just” a call to “as type Int” on the linspace.
Dan
January 12, 2024, 9:35pm
16
If you only need one such subsample (and not several), I think this might be good (though not superefficient because of randomness):
using StatsBase
sample(longarray, 30; replace=false)
and as the Romans use to say:
if you haven’t imported StatsBase, you haven’t done anything.
Okay, joke aside, the above is not a downsample, so maybe:
longarray[first(1:length(longarray)÷30:end, 30)]
which is short enough to remember and in Base.
lmiq
January 13, 2024, 1:26pm
17
Isn’t just
longarray[begin:30:end]
enough here? (Maybe just adjusting the step if the number of samples is given)
tbeason
January 13, 2024, 2:36pm
18
This is also what I expected to see suggested
lmiq:
enough here?
It doesn’t seem to be enough as OP wants 30 indices nearly evenly distributed, while that will give him only 4.
3 Likes
A[range(start=begin, step=end ÷ 30, length=30)]
1 Like