Subvector pattern matching

I’m looking for a way to check whether a vector contains a sub-vector matching a certain conditional pattern. I wrote a quick and dirty macro doing this, here’s an example to illustrate what I mean:

julia> f=@match i, j, k (i==k) && (j>= k+1)
#194#f (generic function with 1 method)
julia> f([1,3,1,4]) #the sub-vector [1,3,1] matches the condition
julia> f([1,0,1,4])

Basically my question is: is there an existing package doing this ? I had a look at Match.jl, Metatheory.jl and MLStyle.jl and if doesn’t seem they can. Below is the code for my macro. I tried manipulating the AST directly but that was kind of a pain, so I end up converting the expression into a string, replacing and parsing back, which I guess is not very elegant. Do you see a better way to do this ?

function subst(s, w::String)
        for i in 1:length(s)
                o=replace(o, string(s[i])=>"v[i+$(i-1)]")
        return o
macro match(seq,cond)
                v-> begin
                        for i in 1:length(v)-$p+1
                                if $c
                                        return true
                        return false

Converting to/from a string is not only inelegant, but it is also extremely fragile to breakage. Don’t do it. Learn to use the AST if you are going to do metaprogramming.

1 Like

Sure, this was basically just a quick proof of concept, eventually i’ll use the AST directly (I guess part of the question is whether one of those package can make my life easier on that point as well).

Are you doing this so often that it’s worth the effort to design a specialized syntax, versus just writing a loop?

Or is it more that you think this is something many people would use that you want to contribute to the Julia ecosystem?

A bit of both, I suppose. What I’m trying to do is more involved than that little example, and would definitely very tedious to do by hand.

If you’re curious, ultimately I’m trying to do a bit of algebraic rewriting, ie have linear combinations of products of non-commuting variables x_i and then apply rules like x_i x_j => x_j x_i +1 if i<j . I know there are Julia packages for symbolic computations in math but it doesn’t seem they can handle this kind of rule.

A simple way to do this would be:

using IterTools

any(((i,j,k),) -> (i == k) && (j >= k+1), partition([1, 3, 1, 4], 3, 1))
# true

any(((i,j,k),) -> (i == k) && (j >= k+1), partition([1, 0, 1, 4], 3, 1))
# false

Most of the parenthesis are necessary (sorry). The vector is clearly seen as a parameter to partition, and the other parameters are the window size (3) and stepping amount (1). The (i,j,k) takes advantage of automatic destructuring of tuples. Overall this would probably be also a most efficient way to achive the OP results (it short-circuits if it finds a match).

1 Like

Is there a reason to avoid generator syntax here?

julia> using IterTools

julia> g(arr) = any(i==k && j>k for (i,j,k) ∈ partition(arr,3,1))
g (generic function with 1 method)

julia> g([1,3,1,4])

julia> g([1,0,1,4])

Thanks, that’s interesting. However, as I said, ultimately I want to do something more complicated than just checking if such a subarray exists (I also need to know where it is, and to be ale to check several conditions in one loop and returning different things depending on which condition is satisfied). Also, at least on rather small vectors (which is the case I’m interested in) it seems the function generated by my macro is much faster (and both seem to be pretty much as fast for larger ones though mine still have fewer allocations).

julia> f=@match i, j, k (i==k) && (j>= k+1)
#191 (generic function with 1 method)

julia> g(arr) = any(i==k && j>k for (i,j,k) ∈ partition(arr,3,1))
g (generic function with 1 method)

julia> @btime f(rand(-10:10,10))
  89.124 ns (1 allocation: 160 bytes)

julia> @btime g(rand(-10:10,10))
  1.001 μs (19 allocations: 1.47 KiB)

julia> @btime f(rand(-10:10,10000))
  44.575 μs (2 allocations: 78.20 KiB)

julia> @btime g(rand(-10:10,10000))
  45.430 μs (7 allocations: 78.59 KiB)

Thats weird. I guess partition desn’t work as cleanly as I had hoped. In any case, when benchmarking, it doesn’t make sense to benchmark the rand function as well which takes quite a bit of effort.

Try the following, which didn’t do any allocations on my tests:

arr = rand(-10:10,1000);

hh(arr) = any((i == k) && (j >= k+1) for (i,j,k) in (@view(arr[s:s+2]) for s in 1:length(arr)-2))

@btime hh($arr)

and to get the index of pattern match, findfirst can be used:

ff(arr) = findfirst((i == k) && (j >= k+1) for (i,j,k) in (@view(arr[s:s+2]) for s in 1:length(arr)-2))

PS it is best to benchmark on a false (no match) case, to make sure a uniform run over all positions is taken.

If an annoying error and stacktrace come out of findfirst version, it may be needed to add:

Base.pairs(g::Base.Generator{T,I}) where {T<:Base.Iterators.Zip, I} =
      ,v),) -> i=>v, enumerate(g.iter))


well, that sounds to me like pattern matching for term rewriting; If the language of lhses of your rules doesn’t change all the time “compile” it to indexing-automaton, or Aho-Corasick (D)FSA. then you can trace the vector (word) through the FSA and return as soon as terminal (accepting) state is encountered.

You can also do this using my KnuthBendix.jl (sorry, no docs, since I use it only through Groups.jl)

Well kind of, since you’re replacing terms by polynomials which might get complicated (and may not stop :wink: It seems to me that you’re thinking probably in terms of monoid algebra :wink: but then it’s a pretty hard problem. If you tell me that you’re doing some quantum stuff and/or optimization on top, I have a bingo :smiley:

Here’s an example how to create a similar rws:

julia> using KnuthBendix

julia> A = Alphabet([[Symbol(:x, i) for i in 1:3]; [Symbol(:y, j) for j in 1:4]])
Alphabet of Symbol
  1. x1
  2. x2
  3. x3
  4. y1
  5. y2
  6. y3
  7. y4

julia> x, y = let gens = [Word([i]) for i in 1:length(A)]
          gens[1:3], gens[4:7]
(Word{UInt16}[1, 2, 3], Word{UInt16}[4, 5, 6, 7])

julia> ord = LenLex(A, invperm([1,4,2,5,3,6,7]))
LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4

julia> rules = [x[i]*y[j] => y[j]*x[i] for i in 1:3 for j in 1:4 if i<=j]
9-element Vector{Pair{Word{UInt16}, Word{UInt16}}}:
 1·4 => 4·1
 1·5 => 5·1
 1·6 => 6·1
 1·7 => 7·1
 2·5 => 5·2
 2·6 => 6·2
 2·7 => 7·2
 3·6 => 6·3
 3·7 => 7·3

julia> rws = RewritingSystem(rules, ord)
Rewriting System ordered by LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4:
1. y1*x1         →      x1*y1
2. y2*x1         →      x1*y2
3. y3*x1         →      x1*y3
4. y4*x1         →      x1*y4
5. y2*x2         →      x2*y2
6. y3*x2         →      x2*y3
7. y4*x2         →      x2*y4
8. y3*x3         →      x3*y3
9. y4*x3         →      x3*y4

julia> R = knuthbendix(rws)
Rewriting System ordered by LenLex: x1 < y1 < x2 < y2 < x3 < y3 < y4:
1. y1*x1         →      x1*y1
2. y2*x1         →      x1*y2
3. y3*x1         →      x1*y3
4. y4*x1         →      x1*y4
5. y2*x2         →      x2*y2
6. y3*x2         →      x2*y3
7. y4*x2         →      x2*y4
8. y3*x3         →      x3*y3
9. y4*x3         →      x3*y4

julia> isconfluent(R)

julia> w = Word(rand(1:length(alphabet(R)), 15))
Word{UInt16}: 3·4·2·2·3·1·6·7·2·3·3·6·5·5·2

julia> KnuthBendix.print_repr(stdout, w, alphabet(R))
julia> v = KnuthBendix.rewrite(w, R)
Word{UInt16}: 3·4·2·2·3·1·2·3·3·2·6·7·6·5·5

julia> KnuthBendix.print_repr(stdout, v, alphabet(R))

This can be much more pleasant to read :wink: if you create a monoid on top of the rws; I have code lying around, ping me if you want it!

Thanks ! I did come across your package at some point, and I’ll definitely look at it for inspiration. Indeed there is some quantum stuff going on :slight_smile: That being said, I did precisely want to be able to handle “abstract” rules in terms of indices rather than generating all possibles rules for all possible indices values (and in particular I don’t want to fix the max value for indices in advance) because I thought this won’t be efficient, but maybe it’s dumb. I was also planning to implement some version of Aho-Corasick that could also handle those abstract rules…

haha, I did get a bingo :wink: Maybe Metatheory will help you somehow, but if I remember correctly it also applies only to term rewriting, i.e. x[i]*y[j] => y[j]*x[i] - 1 is no go on this level.
This needs to be implemented on the level of monoid algebra e.g. if you’re doing optimization as a linear constraint. If you have higher order constraints this quickly becomes a normalization problem (in a monoid algebra), which is harder than ordinary commutative Gröbner basis…

Why not generating all the rules you need? If the set of rules is small You gain nothing; If it is large it’s necessary to make rewriting fast, unless you create a “lazy” indexing automaton. This might be possible with some work (at least when the rules form such “abstract families”, but is the rewriting really the bottleneck of your computation?

what do you gain by not fixing the values for indices? It seems to me that You can easily recreate the monoid later if you decide to change them…

And no idea here is dumb :wink: It’s hard to know what is fast and what is not unless you’ve attempted to solve the problem once or twice. Groups.jl are after two major redesigns and I’m still not exactly satisfied, but maybe the third will make the charm :wink: (not in the near future though)