How simple get vector from ranges?

question
performance

#1

In Matlab I have vector and vector of ranges.

a = [10., 9., 8., 7., 6., 5., 4., 3., 2., 1.]
dr = [1:2, 4:7]

If I want a new vector from ranges I will call.

mo = a(dr)
mo = 10 9 7 6 5 4

In Julia I started with the function of map() and I got this.

jHelp = map(x -> a[x], dr)
JHelp = Array{Array{Float64,1},1}[[10., 9.], [7., 6., 5., 4.]]

After much effort :slight_smile: , I worked on the same result as Matlab with the help of mapreduce().
.

jo = mapreduce(x -> a[x], vcat, dr)
jo = [10., 9., 7., 6., 5., 4.]

There is some simpler alternative or this is the best possible solution?
Thanks.


#2

Does this help:

julia> [collect(1:4); collect(6:9)]                                                                                                                                                                                
8-element Array{Int64,1}:                                                                                                                                                                                          
 1                                                                                                                                                                                                                 
 2                                                                                                                                                                                                                 
 3                                                                                                                                                                                                                 
 4                                                                                                                                                                                                                 
 6                                                                                                                                                                                                                 
 7                                                                                                                                                                                                                 
 8                                                                                                                                                                                                                 
 9                                                                                                                                                                                                                 

collect turns the range into a vector, then the ; inside the [] concatenates. That should help you make the dr into a plain vector which you can use to index a.


#3

How about

using Base.Iterators: flatten
[x for (index, x) in enumerate(a) if index in flatten(dr)]

or

[a[i] for i in flatten(dr)]

although I quite like your mapreduce solution.


#4
julia> a = [10., 9., 8., 7., 6., 5., 4., 3., 2., 1.]
10-element Array{Float64,1}:
d 10.0
  9.0
  8.0
  7.0
  6.0
  5.0
  4.0
  3.0
  2.0
  1.0

julia> dr = [1:2; 4:7]
6-element Array{Int64,1}:
 1
 2
 4
 5
 6
 7

julia> a[dr]
6-element Array{Float64,1}:
 10.0
  9.0
  7.0
  6.0
  5.0
  4.0

#5

And note that you don’t have a vector of ranges in matlab. Ranges isn’t even a (user visible) type in matlab AFAICT.


#6

Sure, but creating a dense index vector in this manner would result in double the allocations asymptotically compared to the mapreduce or flatten approaches.


#7

@mauro3 and @yuyichao It seems that the semicolon could be a solution, even without the use of collect() function. The question remains how to create / transmit such a vector that is unnecessarily long. In the sense that I am working with big data and transfer the vector of ranges through multiple functions.

Array{UnitRange{Int64},1}[1:2, 4:7]
Array{Int64,1}[1:2; 4:7] # with semicolon ';'

A possible solution would be to transfer a vector as Array {UnitRange {Int64}, 1} and call it as Array {Int64,1} directly with vcat(). What do you think?

julia> a[vcat(dr...)]
6-element Array{Float64,1}:
 10.0
  9.0
  7.0
  6.0
  5.0
  4.0

Thanks and thank @yuyichao for correction about AFAICT.


#8

vcat(dr...) produces the same vector [1:2; 4:7], so it does not save memory. Also your mapreduce solution produces two vectors which then get vcated. I think @tkoolen’s second solution is best.


#9

Ah yes of course, I missed that.


#10

EDIT: Just realized this is identical to @yuyichao’s opst
I may be misunderstanding the question completely, but is the issue not just that you use semicolon rather than comma for concatenation in Julia?

a = [10., 9, 8, 7, 6, 5, 4, 3, 2, 1]
dr = [1:2; 4:7]
a[dr]
#6-element Array{Float64,1}:
# 10.0
#  9.0
#  7.0
#  6.0
#  5.0
#  4.0

#11

For fun I benchmarked the vcat-the-indices approach and the flatten approach for large a and dr (with dr a Vector{UnitRange{Int}} as in the original post, to allow for fair comparison).


using BenchmarkTools
using Base.Iterators: flatten

a = rand(10_000_000)
dr = [10 : 100, 150 : 6_000, 10_000 : 8_000_000]

function f1(a, dr)
    a[vcat(dr...)]
end

function f2(a, dr)
    [a[i] for i in flatten(dr)]
end

@benchmark f1($a, $dr)

@benchmark f2($a, $dr)

Results for f1 (vcat-the-indices):

BenchmarkTools.Trial: 
  memory estimate:  122.01 MiB
  allocs estimate:  7
  --------------
  minimum time:     42.596 ms (21.99% GC)
  median time:      43.553 ms (24.15% GC)
  mean time:        44.995 ms (26.19% GC)
  maximum time:     110.922 ms (69.96% GC)

and for f2 (flatten):

BenchmarkTools.Trial: 
  memory estimate:  65.00 MiB
  allocs estimate:  28
  --------------
  minimum time:     72.419 ms (0.00% GC)
  median time:      74.665 ms (0.00% GC)
  mean time:        86.345 ms (8.86% GC)
  maximum time:     233.150 ms (54.92% GC)
  --------------
  samples:          43
  evals/sample:     1

So about double the allocations for f1 as I expected, but f1 is surprisingly faster!


#12

The surprising results above are because length is not defined for the Flatten object returned by flatten, which means that the size of the result cannot be determined in advance. If I define:

Base.length(fl::Base.Iterators.Flatten{<:AbstractVector{<:UnitRange}}) = sum(length, fl.it)
Base.iteratorsize(::Base.Iterators.Flatten{<:AbstractVector{<:UnitRange}}) = Base.HasLength()

then the benchmark results for the flatten approach become

BenchmarkTools.Trial: 
  memory estimate:  61.00 MiB
  allocs estimate:  5
  --------------
  minimum time:     21.147 ms (1.61% GC)
  median time:      27.357 ms (20.71% GC)
  mean time:        28.169 ms (22.88% GC)
  maximum time:     90.278 ms (76.97% GC)
  --------------
  samples:          178
  evals/sample:     1

Maybe a more general version of the methods above could make it into Base? We’re getting a lot of mileage out of this simple question.

(As an aside, how awesome is it that you can just tinker like this in Julia!)

Edit: Julia issue: https://github.com/JuliaLang/julia/issues/23431.


#13

Yes, it is awesome.
Thanks, I did not even know that Iteration utilities exist. This is the new feature of Julia 0.6. In Julia 0.5, mapreduce is a good choice in term of speed, but the allocation is a little more than f1. For completeness, I add the benchmark for the mapreduce in the version 0.6.

function f3(a, dr)
    mapreduce(x -> a[x], vcat, dr)
end

julia> @benchmark f1($a, $dr)
BenchmarkTools.Trial:
  memory estimate:  122.01 MiB
  allocs estimate:  8
  --------------
  minimum time:     70.672 ms (16.28% GC)
  median time:      72.660 ms (16.37% GC)
  mean time:        76.670 ms (19.92% GC)
  maximum time:     158.054 ms (61.80% GC)
  --------------
  samples:          66
  evals/sample:     1

julia> @benchmark f2($a, $dr)
BenchmarkTools.Trial:
  memory estimate:  65.00 MiB
  allocs estimate:  28
  --------------
  minimum time:     108.974 ms (0.00% GC)
  median time:      113.256 ms (0.00% GC)
  mean time:        123.818 ms (8.49% GC)
  maximum time:     234.858 ms (49.67% GC)
  --------------
  samples:          32
  evals/sample:     1

julia> @benchmark f3($a, $dr)
BenchmarkTools.Trial:
  memory estimate:  122.05 MiB
  allocs estimate:  10
  --------------
  minimum time:     61.970 ms (18.74% GC)
  median time:      63.719 ms (18.74% GC)
  mean time:        66.872 ms (22.33% GC)
  maximum time:     144.829 ms (64.45% GC)
  --------------
  samples:          75
  evals/sample:     1

After adding hack.

julia> @benchmark f2($a, $dr)
BenchmarkTools.Trial:
  memory estimate:  61.00 MiB
  allocs estimate:  5
  --------------
  minimum time:     30.712 ms (0.91% GC)
  median time:      37.693 ms (15.98% GC)
  mean time:        40.333 ms (19.18% GC)
  maximum time:     118.468 ms (71.25% GC)
  --------------
  samples:          124
  evals/sample:     1