Subvector pattern matching

Adrien · December 28, 2022, 12:45pm

I’m looking for a way to check whether a vector contains a sub-vector matching a certain conditional pattern. I wrote a quick and dirty macro doing this, here’s an example to illustrate what I mean:

julia> f=@match i, j, k (i==k) && (j>= k+1)
#194#f (generic function with 1 method)
julia> f([1,3,1,4]) #the sub-vector [1,3,1] matches the condition
true
julia> f([1,0,1,4])
false

Basically my question is: is there an existing package doing this ? I had a look at Match.jl, Metatheory.jl and MLStyle.jl and if doesn’t seem they can. Below is the code for my macro. I tried manipulating the AST directly but that was kind of a pain, so I end up converting the expression into a string, replacing and parsing back, which I guess is not very elegant. Do you see a better way to do this ?

function subst(s, w::String)
        o=w
        for i in 1:length(s)
                o=replace(o, string(s[i])=>"v[i+$(i-1)]")
        end
        return o
end
macro match(seq,cond)
        p=length(seq.args)
        str=string(cond)
        str=subst(seq.args,str) 
        c=Meta.parse(str)
        quote
                v-> begin
                        for i in 1:length(v)-$p+1
                                if $c
                                        return true
                                end
                        end
                        return false
                end
        end
end

stevengj · December 28, 2022, 12:57pm

Converting to/from a string is not only inelegant, but it is also extremely fragile to breakage. Don’t do it. Learn to use the AST if you are going to do metaprogramming.

Adrien · December 28, 2022, 1:03pm

Sure, this was basically just a quick proof of concept, eventually i’ll use the AST directly (I guess part of the question is whether one of those package can make my life easier on that point as well).

stevengj · December 28, 2022, 1:12pm

Are you doing this so often that it’s worth the effort to design a specialized syntax, versus just writing a loop?

Or is it more that you think this is something many people would use that you want to contribute to the Julia ecosystem?

Adrien · December 28, 2022, 1:59pm

A bit of both, I suppose. What I’m trying to do is more involved than that little example, and would definitely very tedious to do by hand.

Adrien · December 28, 2022, 2:36pm

If you’re curious, ultimately I’m trying to do a bit of algebraic rewriting, ie have linear combinations of products of non-commuting variables x_i and then apply rules like x_i x_j => x_j x_i +1 if i<j . I know there are Julia packages for symbolic computations in math but it doesn’t seem they can handle this kind of rule.

Dan · December 28, 2022, 6:28pm

A simple way to do this would be:

using IterTools

any(((i,j,k),) -> (i == k) && (j >= k+1), partition([1, 3, 1, 4], 3, 1))
# true

any(((i,j,k),) -> (i == k) && (j >= k+1), partition([1, 0, 1, 4], 3, 1))
# false

Most of the parenthesis are necessary (sorry). The vector is clearly seen as a parameter to partition, and the other parameters are the window size (3) and stepping amount (1). The (i,j,k) takes advantage of automatic destructuring of tuples. Overall this would probably be also a most efficient way to achive the OP results (it short-circuits if it finds a match).

uniment · December 28, 2022, 7:54pm

Is there a reason to avoid generator syntax here?

julia> using IterTools

julia> g(arr) = any(i==k && j>k for (i,j,k) ∈ partition(arr,3,1))
g (generic function with 1 method)

julia> g([1,3,1,4])
true

julia> g([1,0,1,4])
false

Adrien · December 29, 2022, 10:39am

Thanks, that’s interesting. However, as I said, ultimately I want to do something more complicated than just checking if such a subarray exists (I also need to know where it is, and to be ale to check several conditions in one loop and returning different things depending on which condition is satisfied). Also, at least on rather small vectors (which is the case I’m interested in) it seems the function generated by my macro is much faster (and both seem to be pretty much as fast for larger ones though mine still have fewer allocations).

julia> f=@match i, j, k (i==k) && (j>= k+1)
#191 (generic function with 1 method)

julia> g(arr) = any(i==k && j>k for (i,j,k) ∈ partition(arr,3,1))
g (generic function with 1 method)

julia> @btime f(rand(-10:10,10))
  89.124 ns (1 allocation: 160 bytes)
false

julia> @btime g(rand(-10:10,10))
  1.001 μs (19 allocations: 1.47 KiB)
false

julia> @btime f(rand(-10:10,10000))
  44.575 μs (2 allocations: 78.20 KiB)
true

julia> @btime g(rand(-10:10,10000))
  45.430 μs (7 allocations: 78.59 KiB)
true

Dan · December 29, 2022, 11:49am

Thats weird. I guess partition desn’t work as cleanly as I had hoped. In any case, when benchmarking, it doesn’t make sense to benchmark the rand function as well which takes quite a bit of effort.

Try the following, which didn’t do any allocations on my tests:

arr = rand(-10:10,1000);

hh(arr) = any((i == k) && (j >= k+1) for (i,j,k) in (@view(arr[s:s+2]) for s in 1:length(arr)-2))

@btime hh($arr)

and to get the index of pattern match, findfirst can be used:

ff(arr) = findfirst((i == k) && (j >= k+1) for (i,j,k) in (@view(arr[s:s+2]) for s in 1:length(arr)-2))

PS it is best to benchmark on a false (no match) case, to make sure a uniform run over all positions is taken.

If an annoying error and stacktrace come out of findfirst version, it may be needed to add:

Base.pairs(g::Base.Generator{T,I}) where {T<:Base.Iterators.Zip, I} =
                Iterators.map(((i,v),) -> i=>v, enumerate(g.iter))

before.

abulak · December 29, 2022, 7:24pm

well, that sounds to me like pattern matching for term rewriting; If the language of lhses of your rules doesn’t change all the time “compile” it to indexing-automaton, or Aho-Corasick (D)FSA. then you can trace the vector (word) through the FSA and return as soon as terminal (accepting) state is encountered.

You can also do this using my KnuthBendix.jl (sorry, no docs, since I use it only through Groups.jl)

Well kind of, since you’re replacing terms by polynomials which might get complicated (and may not stop It seems to me that you’re thinking probably in terms of monoid algebra but then it’s a pretty hard problem. If you tell me that you’re doing some quantum stuff and/or optimization on top, I have a bingo

Here’s an example how to create a similar rws:

julia> using KnuthBendix

julia> A = Alphabet([[Symbol(:x, i) for i in 1:3]; [Symbol(:y, j) for j in 1:4]])
Alphabet of Symbol
  1. x1
  2. x2
  3. x3
  4. y1
  5. y2
  6. y3
  7. y4

julia> x, y = let gens = [Word([i]) for i in 1:length(A)]
          gens[1:3], gens[4:7]
       end
(Word{UInt16}[1, 2, 3], Word{UInt16}[4, 5, 6, 7])

julia> ord = LenLex(A, invperm([1,4,2,5,3,6,7]))
LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4

julia> rules = [x[i]*y[j] => y[j]*x[i] for i in 1:3 for j in 1:4 if i<=j]
9-element Vector{Pair{Word{UInt16}, Word{UInt16}}}:
 1·4 => 4·1
 1·5 => 5·1
 1·6 => 6·1
 1·7 => 7·1
 2·5 => 5·2
 2·6 => 6·2
 2·7 => 7·2
 3·6 => 6·3
 3·7 => 7·3

julia> rws = RewritingSystem(rules, ord)
Rewriting System ordered by LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4:
1. y1*x1         →      x1*y1
2. y2*x1         →      x1*y2
3. y3*x1         →      x1*y3
4. y4*x1         →      x1*y4
5. y2*x2         →      x2*y2
6. y3*x2         →      x2*y3
7. y4*x2         →      x2*y4
8. y3*x3         →      x3*y3
9. y4*x3         →      x3*y4

julia> R = knuthbendix(rws)
Rewriting System ordered by LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4:
1. y1*x1         →      x1*y1
2. y2*x1         →      x1*y2
3. y3*x1         →      x1*y3
4. y4*x1         →      x1*y4
5. y2*x2         →      x2*y2
6. y3*x2         →      x2*y3
7. y4*x2         →      x2*y4
8. y3*x3         →      x3*y3
9. y4*x3         →      x3*y4

julia> isconfluent(R)
true

julia> w = Word(rand(1:length(alphabet(R)), 15))
Word{UInt16}: 3·4·2·2·3·1·6·7·2·3·3·6·5·5·2

julia> KnuthBendix.print_repr(stdout, w, alphabet(R))
x3*y1*x2^2*x3*x1*y3*y4*x2*x3^2*y3*y2^2*x2
julia> v = KnuthBendix.rewrite(w, R)
Word{UInt16}: 3·4·2·2·3·1·2·3·3·2·6·7·6·5·5

julia> KnuthBendix.print_repr(stdout, v, alphabet(R))
x3*y1*x2^2*x3*x1*x2*x3^2*x2*y3*y4*y3*y2^2

This can be much more pleasant to read if you create a monoid on top of the rws; I have code lying around, ping me if you want it!

Adrien · December 30, 2022, 12:19pm

Thanks ! I did come across your package at some point, and I’ll definitely look at it for inspiration. Indeed there is some quantum stuff going on That being said, I did precisely want to be able to handle “abstract” rules in terms of indices rather than generating all possibles rules for all possible indices values (and in particular I don’t want to fix the max value for indices in advance) because I thought this won’t be efficient, but maybe it’s dumb. I was also planning to implement some version of Aho-Corasick that could also handle those abstract rules…

abulak · December 30, 2022, 10:23pm

haha, I did get a bingo Maybe Metatheory will help you somehow, but if I remember correctly it also applies only to term rewriting, i.e. x[i]*y[j] => y[j]*x[i] - 1 is no go on this level.
This needs to be implemented on the level of monoid algebra e.g. if you’re doing optimization as a linear constraint. If you have higher order constraints this quickly becomes a normalization problem (in a monoid algebra), which is harder than ordinary commutative Gröbner basis…

Why not generating all the rules you need? If the set of rules is small You gain nothing; If it is large it’s necessary to make rewriting fast, unless you create a “lazy” indexing automaton. This might be possible with some work (at least when the rules form such “abstract families”, but is the rewriting really the bottleneck of your computation?

what do you gain by not fixing the values for indices? It seems to me that You can easily recreate the monoid later if you decide to change them…

And no idea here is dumb It’s hard to know what is fast and what is not unless you’ve attempted to solve the problem once or twice. Groups.jl are after two major redesigns and I’m still not exactly satisfied, but maybe the third will make the charm (not in the near future though)

Topic		Replies	Views
Finding patterns in vectors General Usage array , pattern-matching , vector	9	996	June 20, 2022
4 major problems of pattern matching in Julia Community pattern-matching	19	7508	February 10, 2020
Compare patterns in byte array New to Julia	4	509	February 19, 2021
An elegant and efficient way to extract something from ASTs Package Announcements package	2	1700	January 23, 2019
Correct usage of regex matches New to Julia regex	5	712	May 9, 2021

Subvector pattern matching

Related topics