Some_thing in some_list

jiang_ming_zhang · January 27, 2024, 3:32am

I am from matlab. Codes like

if  some_thing in some_list

is new to me. It of course makes coding simple, but what is the price? I guess julia just checks the elements of some_list one by one to see whether some_thing is in it. That is very time consuming, right?

adienes · January 27, 2024, 3:37am

if you have no information about the structure of some_list, then there is no other option to find an element

if you do have information about the structure of some_list, that can be put in the type and in can be specialized to be more performant on that type

mkitti · January 27, 2024, 3:40am

It really depends on what some_thing and some_list are.

For example, if some_thing is 5 and some_list is 1:4 which a UnitRange in Julia, then the code used is as follows.

in(x::Integer, r::AbstractUnitRange{<:Integer}) = (first(r) <= x) & (x <= last(r))

In Julia, 1:4 is not [1, 2, 3, 4] exactly but rather a UnitRange, a struct with two fields. Only the code that is needed execute is a comparison with those two fields.

Oscar_Smith · January 27, 2024, 4:29am

Matlab has the ismember Array elements that are members of set array - MATLAB ismember function which basically does the same thing. It’s not going to be especially fast, and you should likely consider using a Set if you have an algorithm that is bottle-necked by queries.

mike.ingold · January 28, 2024, 2:44am

Is the RHS of this substantively different than first(r) <= x <= last(t) ? Git blame pegs this code from the v0.6 era, so maybe that syntax wasn’t supported yet?

danielwe · January 28, 2024, 5:00am

Just to spell this out explicitly: if some_list is a Set (despite the name), that’s another example where some_thing in some_list has a more efficient implementation than checking elements one by one; specifically, it does a hash table lookup.

danielwe · January 28, 2024, 5:18am

The difference is the lack of short-circuiting in in since it uses & and not &&. Not sure why that’s preferable. Maybe something to do with SIMDability when broadcasting? Anyway, once & is replaced with && the two variants generate the same lowered code:

function shortcircuitingin(x::Integer, r::AbstractUnitRange{<:Integer})
    return (first(r) <= x) && (x <= last(r))
end

function chainingin(x::Integer, r::AbstractUnitRange{<:Integer})
    return first(r) <= x <= last(r)
end

julia> @code_lowered in(1, 2:3)
CodeInfo(
1 ─ %1 = Base.first(r)
│   %2 = %1 <= x
│   %3 = Base.last(r)
│   %4 = x <= %3
│   %5 = %2 & %4
└──      return %5
)

julia> @code_lowered shortcircuitingin(1, 2:3)
CodeInfo(
1 ─ %1 = Main.first(r)
│   %2 = %1 <= x
└──      goto #3 if not %2
2 ─ %4 = Main.last(r)
│   %5 = x <= %4
└──      return %5
3 ─      return false
)

julia> @code_lowered chainingin(1, 2:3)
CodeInfo(
1 ─ %1 = Main.first(r)
│   %2 = %1 <= x
└──      goto #3 if not %2
2 ─ %4 = Main.last(r)
│   %5 = x <= %4
└──      return %5
3 ─      return false
)

Oscar_Smith · January 28, 2024, 6:08am

in general, branches are slow so Julia’s version uses 1 branch instead of 2 (although realistically LLVM probably could do this for us)

StevenSiew · January 28, 2024, 6:16am

You mean like this?

julia> for k in 1:10
           if isodd(k)
                println(k)
           end
       end
1
3
5
7
9

mnemnion · January 28, 2024, 4:28pm

Others have pointed out that the particular code executed depends on the type of some_list, I wanted to add an observation, which is that a Set is not necessarily more performant than a Vector for small collections of values.

A Set will use a hash lookup of some_thing in the Set, which scales well: it takes about as long to check some_thing against a Set with thousands of elements as it does to check it against a Set with only a few.

But if some_list is a short Vector (where “short” is strictly empirical), testing membership of some_thing might be faster than a Set. The Set membership has to hash some_thing and probe it, this is branchy code, whereas a small Vector of primitive values is probably laid out in contiguous memory and may be scanned with good cache locality. But the average time to do this always increases as the Vector gets longer, and at some point a Set is clearly the better choice. But if you need the data to be in a Vector, you can’t save time converting it into a Set just to check membership.

The meta-lesson here is that performance is empirical, when it’s important you must profile and run benchmarks to have confidence that the code is optimal.

Palli · January 28, 2024, 10:52pm

That applies to the default Set (and Dict it’s built on), but Julia has others (smallintdict in Julia, only in C it seems). FYI: OrderedSet might already be faster, for small sets, from:

It also implements LittleDict which is a ordered dictionary, that is much faster than any other AbstractDict (ordered or not) for small collections.

[You can use that for sets already; must prevent possibilities of duplicates manually.]

I’ve thought of implementing a HybridDict of that one and a regular one, would be helpful for small dicts, keeping scalability of large ones, also helpful for sets. Python dicts are already ordered, I don’t know of its set implementation, might have such an optimization already, and you can use:

PySet: PythonCall.jl/src/Wrap/PySet.jl at 2165f91602e7f1de862384f442c2c9814766c1e6 · JuliaPy/PythonCall.jl · GitHub

mnemnion · January 29, 2024, 5:13pm

LittleDict uses a pair of Vectors under the hood (or Tuples if frozen/imutable), so it’s doing the same thing if you squint

The ordered data structures are very nice if you need to preserve insertion order! My point is that a question like “is checking membership of a scalar in a vector fast or slow” can’t be answered without more information. LittleDict says the heuristic should be about 30 elements, so let’s go with that: if your Vector has length(v) < 30 or so, switching to a Set is unlikely to result in a speedup. Still might be the correct data structure, if you need to compute unions or intersections for example. It all depends on what you need.

Topic		Replies	Views
Julian way of MATLAB ismember function? General Usage	16	6788	June 20, 2019
Julia's in.() seems slow compared to R's %in% Performance	40	3019	May 9, 2019
Fast in.(x, Ref(y)) Performance	4	520	May 18, 2020
Set performance question New to Julia	3	576	May 8, 2020
Fastest way to filter when right hand sinde of `in` is large Performance	3	396	December 28, 2020

Some_thing in some_list

Related topics