Question: does Julia have an analog to C++'s std::includes? issubset
doesn’t do what I’m looking for (I want following to return false, as the first array isn’t a sub-array of the 2nd):
julia> issubset([4,4,1],[1,4,2])
true
There’s a Stack Overflow post asking for something similar, but it’s several years old and the code samples there (issubvec
, subset2
) don’t appear to be compatible with a more-or-less current version of Julia (1.5.1).
I wrote a function to do what I want:
julia> function subset4(a::Array{Int},b::Array{Int})
a_unq=unique(a)
b_unq=unique(b)
subset::Bool=true
if issubset(a_unq, b_unq)
for val in a_unq
if count(x->x==val,a)>count(y->y==val,b)
subset=false
break
end
end
else
subset=false
end
return subset
end
subset4 (generic function with 1 method)
It behaves correctly, but is much slower than either issubset
or a C++ version of the thing.
issubset
(after compile):
julia> @time for ii in 1:1000000 issubset(rand(1:10,3),rand(1:10,10)) end
0.168592 seconds (2.00 M allocations: 259.399 MiB, 9.08% gc time)
subset4
(after compile):
julia> @time for ii in 1:1000000 subset4(rand(1:10,3),rand(1:10,10)) end
0.795771 seconds (15.00 M allocations: 1.461 GiB, 11.04% gc time)
This is almost 5x slower.
C++ std::includes
:
#include <iostream>
#include <array>
#include <random>
#include <algorithm>
int main() {
std::array<int, 3> a;
std::array<int, 10> b;
bool subset;
std::mt19937_64 randomGenerator;
randomGenerator.seed(std::random_device{}());
std::uniform_int_distribution<int> digitDistro{1, 10};
for(int j=0; j<1000000; j++) {
for(int i=0; i<3; i++) a[i]=digitDistro(randomGenerator);
for(int i=0; i<10; i++) b[i]=digitDistro(randomGenerator);
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
subset = std::includes(b.begin(), b.end(), a.begin(), a.end());
}
return 0;
}
$ g++ -o includesTest includesTest.cpp -O3
$ time ./includesTest
real 0m0.166s
user 0m0.166s
sys 0m0.000s
issubset
is about as fast as std::includes
.
I need something ideally as fast as the “fast” functions in this post, as I’m going to be iterating through a large number of combinations of numbers and checking if they’re a subset of a master list.