Vector of Contiguous Booleans to Vector of Unit Ranges

So consider the following vector: [true true, false, true, true , false] we would want the code to return [1:2,3:3,4:5,6:6]

or: [true true, true, true, true, false] -> [1:5, 6:6]

Basically its a groupby or chunk operation, but often times when this shows up you don’t really want/need the overhead of a dataframe.

Anyone else find themselves writing this many times? hoping there’s a better way then the way they’ve written it in the past… I wrote this again in Julia for probably the 10th time and still hate the way I did it.

anyone know of a better less edge case handling on every LOC way?

Not a full solution but my first thought would be to use findall(conditional, x) to get the locations of the trues and then you can iterate on that to stitch them together, filling in where the falses live as you go along. Basically would just need to keep track of the start of the sequence, and as long each time you pull the next value you are incrementing by 1 then you are still within a sequence. If not, you know that the sequence was cut and based on how many spots you skipped you know where to put the false elements.

Are you looking for the actual value [1:5, 6:6] or the corresponding elements in a?

1 Like

I think you might as well do the last part directly, without first using findall.

How about this:

julia> v = [true, true, false, true, true , false];

julia> ends = [i for i in 1:length(v)-1 if v[i] != v[i+1]]
3-element Array{Int64,1}:

julia> starts = ends .+ 1

julia> push!(ends, length(v))

julia> pushfirst!(starts, 1)

julia> UnitRange.(starts, ends)
4-element Array{UnitRange{Int64},1}:
1 Like

You are almost asking for run-length encoding, which is implemented in StatsBase. Here is a solution based on StatsBase.rle:

using StatsBase, IterTools

function chunks(x)
    _, lens = rle(x)
    [(from+1):to for (from, to) in partition(Iterators.flatten(((0,),cumsum(lens))), 2, 1)]


Not sure what you mean Zach?

I am looking for the vector of unit ranges so the [1:5, 6:6] vector in that example, so I can do something with a, or another vector of the same dimenstionality as a. I hope that makes sense.

@dpsanders - nice solution I’ve done roughly that way at some point, but you handled some of the edge stuff better than I did (mine was a bit uglier going this route). But I do like this approach probably the most now!

@yha - what version of Julia are you running when I run your code with StatsBase loaded I had to add, Iterators.partition, and even then I get the following error:

ERROR: LoadError: MethodError: no method matching partition(::Base.Iterators.Flatten{Tuple{Tuple{Int64},Array{Int64,1}}}, ::Int64, ::Int64)
Closest candidates are:
  partition(::Any, ::Integer) at iterators.jl:1084

Sorry for being a bit cryptic. So if you have the indices 1:6 and you want to do some fancy stuff with that what I do is.

So if I wanted the to have something like [1:3, 5:6] I could do one of these:

julia> using StaticRanges

julia> find_all(or(<(4), >(4)), 1:6). # or find_all(!=(4), 1:6)
5-element GapRange{Int64,UnitRange{Int64},UnitRange{Int64}}:

Which returns a vector type (GapRange) that strictly contains ranges that are non-continuous a a specific spot to avoid unnecessarily allocating for things like this.

If I wanted to get the elements of a corresponding to those indices then I’d do

julia> using AxisIndices

julia> AxisVector([1,2,3, 5,6,7, 9])[or(<(5), ==(6))]
5-element AxisArray{Int64,1}
 • dim_1 - 1:5
  1   1  
  2   2  
  3   3  
  4   5  
  5   7  
1 Like

I’m using Julia 1.4.2, and partition in the code above comes from IterTools, not Iterators. Looks like the version in Iterators doesn’t support the third argument.

1 Like

@yha now it works like a charm. Thanks

@Zach_Christensen - really cool solutions. They don’t explicity answer what I wanted but! I think I see a solution using them. Had no idea those libraries existed, thank you for sharing this.

Okay I think I’ll try to implement a version using your solution, then benchmark some stuff and see what is fastest/easiest to read etc. Then maybe pick a solution(I will not pick my versions of anyones entries you all did the cool stuff I just asked a question).

Thanks everyone

1 Like

Not sure if it helps, but StructArrays has something similar (the code will look a bit more complex than necessary because it handles data within a range and with a permutation). You can see the code here, in particular the iterate method.

EDIT: got nerdsniped, this is what that code would look like for your case (without the additional range and permutation):

julia> struct Chunk{V<:AbstractVector}

julia> Base.IteratorSize(::Type{Chunk{V}}) where {V}  = Base.SizeUnknown()

julia> Base.eltype(::Type{Chunk{V}}) where {V}  = UnitRange{Int}

julia> function Base.iterate(g::Chunk, i = firstindex(g.vec))
           i1, l = i+1, lastindex(g.vec)
           i > l && return nothing
           @inbounds while i1 <= l && g.vec[i] == g.vec[i1]
               i1 += 1
           return (i:(i1-1), i1)
1 Like

Yeah, I really need to pull all the “find” related stuff out into its own package so people actually know about it (but for some reason my school insists I keep working on my PhD :wink:).

1 Like

@piever very cool approach. My first version of this was an iterator as well. I had huge vectors and just needed to do chunk -> emit -> process stuff. My current use case is not huge at all, so I was just tinkering.

I’d love to open up benchmarking and stuff but my brain is total soup… The code in my OP is totally wrong and I have no idea what happened. Wrote it was getting the right solutions, then I changed it? Not sure things are a blur right now… Anyways, so be it, I’m gonna call this solved with dpsanders solution because its succinct but too succinct. I

same problem right now I have to drum up a project for … life reasons can’t keep making these pit stops. Finish your PhD trust me you’ll be way happier when its all said and done lol. Not that life after a PhD is soooo much better, but it has its perks.