is new to me. It of course makes coding simple, but what is the price? I guess julia just checks the elements of some_list one by one to see whether some_thing is in it. That is very time consuming, right?
In Julia, 1:4 is not [1, 2, 3, 4] exactly but rather a UnitRange, a struct with two fields. Only the code that is needed execute is a comparison with those two fields.
Matlab has the ismemberArray elements that are members of set array - MATLAB ismember function which basically does the same thing. It’s not going to be especially fast, and you should likely consider using a Set if you have an algorithm that is bottle-necked by queries.
Is the RHS of this substantively different than first(r) <= x <= last(t) ? Git blame pegs this code from the v0.6 era, so maybe that syntax wasn’t supported yet?
Just to spell this out explicitly: if some_list is a Set (despite the name), that’s another example where some_thing in some_list has a more efficient implementation than checking elements one by one; specifically, it does a hash table lookup.
The difference is the lack of short-circuiting in in since it uses & and not &&. Not sure why that’s preferable. Maybe something to do with SIMDability when broadcasting? Anyway, once & is replaced with && the two variants generate the same lowered code:
function shortcircuitingin(x::Integer, r::AbstractUnitRange{<:Integer})
return (first(r) <= x) && (x <= last(r))
end
function chainingin(x::Integer, r::AbstractUnitRange{<:Integer})
return first(r) <= x <= last(r)
end
Others have pointed out that the particular code executed depends on the type of some_list, I wanted to add an observation, which is that a Set is not necessarily more performant than a Vector for small collections of values.
A Set will use a hash lookup of some_thing in the Set, which scales well: it takes about as long to check some_thing against a Set with thousands of elements as it does to check it against a Set with only a few.
But if some_list is a short Vector (where “short” is strictly empirical), testing membership of some_thing might be faster than a Set. The Set membership has to hash some_thing and probe it, this is branchy code, whereas a small Vector of primitive values is probably laid out in contiguous memory and may be scanned with good cache locality. But the average time to do this always increases as the Vector gets longer, and at some point a Set is clearly the better choice. But if you need the data to be in a Vector, you can’t save time converting it into a Set just to check membership.
The meta-lesson here is that performance is empirical, when it’s important you must profile and run benchmarks to have confidence that the code is optimal.
That applies to the default Set (and Dict it’s built on), but Julia has others (smallintdict in Julia, only in C it seems). FYI: OrderedSet might already be faster, for small sets, from:
It also implements LittleDict which is a ordered dictionary, that is much faster than any other AbstractDict (ordered or not) for small collections.
[You can use that for sets already; must prevent possibilities of duplicates manually.]
I’ve thought of implementing a HybridDict of that one and a regular one, would be helpful for small dicts, keeping scalability of large ones, also helpful for sets. Python dicts are already ordered, I don’t know of its set implementation, might have such an optimization already, and you can use:
LittleDict uses a pair of Vectors under the hood (or Tuples if frozen/imutable), so it’s doing the same thing if you squint
The ordered data structures are very nice if you need to preserve insertion order! My point is that a question like “is checking membership of a scalar in a vector fast or slow” can’t be answered without more information. LittleDict says the heuristic should be about 30 elements, so let’s go with that: if your Vector has length(v) < 30 or so, switching to a Set is unlikely to result in a speedup. Still might be the correct data structure, if you need to compute unions or intersections for example. It all depends on what you need.