Sort indices based on value

Is there an existing method that can get a value -> index map of a vector? For example, something that works like this:

function findalls(A::Vector)
    d = Dict{eltype(A), Vector{Int}}()
    for (i,v) in enumerate(A)
        if !haskey(d, v)
            d[v] = Vector{Int}()
        end
        push!(d[v], i)
    end
    d
end

EDIT:
My benchmark results are:

  • @bertschi 's approach is 2x faster than the second candidate at least on my use case (large vector, very few unique entries).
  • My own approach and @Dan 's one-line approach are the same, except that haskey is faster than get!, making my approach 1/3 faster. (if i use get! in my approach then they have the same performance)
  • FlexiGroups.groupfind is on par with @Dan 's approach while having more number of allocations and less overall size of allocations.
  • SplitApplyCombine.groupfind is 3.5x slower with 500x more allocations.

I’m not sure if this is what you want, but maybe:

julia> x = ["a", "b", "b", "c"];

julia> collect(unique(first, zip(x, eachindex(x))))
3-element Vector{Tuple{String, Int64}}:
 ("a", 1)
 ("b", 2)
 ("c", 4)

Or to get a Dict:

julia> Dict(key => val for (key, val) in unique(first, zip(x, eachindex(x))))
Dict{String, Int64} with 3 entries:
  "c" => 4
  "b" => 2
  "a" => 1

In your example I expect the result to be

"c" => [4]
"b" => [2,3]
"a" => [1]

Either tuple or dict works for me.

How about:

indexall(x) = foldl((d,(k,v))->(push!(get!(()->Int[],d,v),k); d),
  pairs(x); init=Dict{eltype(x),Vector{Int}}())

which gives:

julia> x = ["a", "b", "b", "c"];

julia> indexall(x)
Dict{String, Vector{Int64}} with 3 entries:
  "c" => [4]
  "b" => [2, 3]
  "a" => [1]
1 Like

Maybe not the fastest, but rather short:

map(e -> e => findall(==(e), x), unique(x))
2 Likes

Honestly, what you’ve written in the start is great, @putianyi888 (just make the dict be a Dict{eltype(A), Vector{Int}}()). There’s nothing magical about built-in/existing functionality — the stuff you write can perform just as well.

Run with it!

6 Likes

If you go with function in OP, the following:

needs to be:

d = Dict{eltype(A), Vector{Int}}()
1 Like

Also,

findalls2(x) = begin
    d = Dict{eltype(x),Vector{Int}}()
    push!.(get!.(()->Int[], Ref(d), values(x)), keys(x))
    return d
end

does the same, but exercises broadcasting a bit.

Try:

using FlexiGroups

groupfind(A)
1 Like

What is the difference with:

using SplitApplyCombine
groupfind(A)
1 Like

The function is clearly inspired by SplitApplyCombine (:
For regular Vectors the difference is basically just performance, FlexiGroups are typically (always?) faster.
More generally, see a list of main differences at Alexander Plavin / FlexiGroups.jl · GitLab. Better support for different collection types, some helper features around grouping. I tried getting some of these into SplitApplyCombine, but didn’t manage to – so, created a more focused package.

1 Like
using FlexiGroups
help?> groupfind
search: groupfind

  No documentation found.

Yeah I didn’t expect groupfind to actually be useful (:
I use group()/groupview()/groupmap() functions, but almost never really needed groupfind() myself. Added it simply because such a function already existed in SplitApplyCombine. Others (group()/…) have docstrings, maybe now it’s a sign that I should add one for groupfind as well.

I might experiment with some names. eg groupfindall, groupindices.