4 major problems of pattern matching in Julia

Despite of the notation issues of existing Julia macro libraries for pattern matching,
based on my experience as a concerned user and developer, I hereby propose 4 major problems preventing us from bringing out good enough pattern matching, which I hope the authors and contributors of existing and future pattern matching libraries could pay attention to:

  1. Optimizations: merging and splitting (EDIT addition) cases/patterns using algorithms of analysing decision trees, thus users won’t need aware how to write efficient code but how to write readable and maintainable code.

  2. Portability. Expanding match macros usually needs non trivial analysis of the code, and this could be really time-consuming. As a result, debugging can be really slow and disturbing. My personal suggestion is making a small set of a pattern matching library “develop time only”, which just statically generates julia code without @match macros.

  3. Extensibility. The capability of defining custom patterns counts, because the demands of practical developments vary from the fields we work on, and one can never say he/she has known all fields of software developments and she/he has already made the useful pattern matching facilities for all those fields. One way to address this is providing a convenient way to define domain-specific patterns, which is called extensible pattern matching. And something called first class pattern matching is also helpful to this case.

  4. Error reporting(Maintainability). The failure information of matching helps us to understanding the incorrectness in our code, and if the information doesn’t make sense, users will feel awful about the experience. It also closely concerns the maintainability and reliability of our code. IMO, we need source code positions, representation of the pattern in which the failure occurred, and the potential solutions to solve.

The idea was originally posted at https://github.com/RelationalAI-oss/Rematch.jl/issues/27#issuecomment-583914936 , and I appreciate Match.jl, Rematch.jl and other libraries whose contributors have spent time with this field.

Feel free to come up with more potential problems of Julia pattern matching.

3 Likes

I am somewhat surprised about the great selection of pattern matching libraries in Julia. The fact that there are so many suggests that people find pattern matching in Julia useful.

I feel like I am missing something because I don’t (except when I work with ASTs, for which MacroTools.jl is perfect). I am not opposed to the idea in general: eg in Haskell, I recognize that it is an indispensable idiom.

It’s just that in Julia, I find that parametric multiple dispatch (with the common idioms) is powerful enough for my needs. Whenever I need something resembling pattern matching, I factor out the relevant code to a small kernel function.

This is not meant to be a criticism of programming styles that use pattern matching, I am just wondering what I am missing.

2 Likes

I think there are many situations where the pattern matching depends on runtime values and therefore has a lot of overhead when it happens in an inner loop. In this cases the multiple dispatch pattern can have significant overhead.

Interestingly, after spending some time for a larger project in Rust and using its Enums+Pattern matching extensively, I had for the first time feeling that there was a feature in another programming language that Julia did not have and which I wished it had. see rust docs

However, after spending some time thinking I realized that a lot of the pattern matching convenience in Rust came from the fact that Rust is statically compiled, so the compiler could do tests if all cases were covered etc and warn you about missing cases in all the match calls in your program whenever you extended an Enum.

So in general I agree with @Tamas_Papp that I try to stick with idiomatic Julia and use parametric dispatch instead of pattern matching whenever possible. I think the main problem using one of the pattern matching libraries is that they are simply not used often enough in the Julia ecosystem to become idiomatic Julia code. So, the main goal (more readable code) is not achieved because the user looking at the code first has to get familiar with the pattern matching syntax used in that particular case. Maybe it is a circular problem, but I would use more pattern dispatch if it were more abundant in other Julia code I am reading.

1 Like

@Tamas_Papp Hi, I don’t think here is a good place to ask about what you’re missing. You did miss something, which might not appear in your tasks very often(or you don’t know you can use pattern macthing to solve it), but I’d say generally pattern matching does have many use cases.

@fabiangans What you expect from pattern matching is exhaustiveness checking, which is almost only applicable in static programming languages. I’m recently considering about making a static checked Julia which can have this feature. Julia itself didn’t end the problem “dynamic or static, which better?”, in static language you can ideally handle map(f, []) or have pattern matching which warns if you forgot some cases to match, but expressiveness got restricted, e.g., supporting funtion (x) function (y::Vector{T}) where T; ... end end(where is nested) in statically typed languages is kind of difficult.

Julia pattern matching is useful, and one use case can be verifying the shape of data and accessing the contents.

For Instance

something extremely useful called GraphQL/graph query language can be implemented in Julia in very few lines when using pattern matching:

struct Query{T, F} # query an array T[...] by function F
    f :: F
end

Query(t, f) = Query{t, typeof(f)}(f)

@active ArrayTo{T}(x) begin
    x isa T ? collect(x) : nothing
end

function concat!(xs, init)
    reduce(append!, xs, init=init)
    nothing
end

function better_graphql(query::Query{T}, data) where T
    if query.f(data)
        init = T[data]
    else
        init = T[]
    end

    @match data begin
        # traverse dict
        ArrayTo{Dict}([kv for better_graphql(query, kv.second) in res]) => concat!(res, init)
        # traverse tuple
        ArrayTo{Tuple}([v for better_graphql(query, v) in res]) => concat!(res, init)
        # traverse array
        [v for better_graphql(query, v) in res] => concat!(res, init)
        leaf =>
            if query.f(leaf)
                push!(init, leaf)
            end
    end

    init
end

How to use? Say I want data in these shapes:

1. Dict(:a => whatever, :b => whatever, ...)  e.g., Dict(:a=>[], :b=>2)
2. [1, whatever...]   e.g., [1], [1, 2, 3, 4]
3. (whatever1, 42, whatever2)  e.g., ((:a, :b), 42, "ccc")

whatever means we don’t care about the value, but we need the shape.

How could you make this query with pattern matching?

myquery = Query(Any,
    x -> @match x begin
        # shape of data I want
        Dict(:a => _, :b => _) ||
        [1, tl...] ||
        (_, 42, _) => true

        _ => false
    end
)

julia> better_graphql(myquery, Dict(:a=>[2, 3], :b=>(2, "a"), :c=>[1, 2, (0, 42, 5), 5]))
3-element Array{Any,1}:
 Dict{Symbol,Any}(:a => [2, 3],:b => (2, "a"),:c => Any[1, 2, (0, 42, 5), 5])
 Any[1, 2, (0, 42, 5), 5]
 (0, 42, 5)

How could you make this query without pattern matching?

I think we all know pattern matching is not idiomatic in Julia, but the demand is more important, let alone it’s readable and maintainable after a quick study on pattern matching.

2 Likes

Something like

myquery = let
    f(_) = false
    f(x::Dict) = Set(keys(x)) == Set([:a, :b])
    f(x::AbstractVector) = !isempty(x) && first(x) == 1
    f(x::NTuple{3}) = x[2] == 42
    Query(Any, f)
end

(also, if I find myself testing for keys of Dicts etc often, I would factor these out to a tiny utility function).

Note that I am not trying to derail this discussion, and I have nothing against ML-style pattern matching.

I just find multiple dispatch very powerful, readable, and composable. Regarding your points above, it would take care of (1) like any other Julia code, be trivially portable (2), extensible without going to a DSL (3), and (4) is taken care of by the language again.

just try more deep data structure…

I am happy to do that, do you have an MWE?

I also think if possible you should use vanilla multiple dispatch. However there are other use cases. If possible you should avoid DSLs, but if there’re convenient tools we shall not deny them.

You mentioned portability, yes, this is very important, and this is what I’m recent working on. Making pattern matching a develop time only framework and generating vanilla julia code will be attractive, and almost applicable in MLStyle. I want to reduce the use of macros in Julia.

I of course recognize that a DSL will be more compact for an example like this. But this is from a benchmark/test suite, do you actually test for patterns like this in practice?

If I understand correctly, this kind of pattern matching combines the comparisons of types and values, with the focus on equality comparisons for values (this reminds me of the EQL specializer in CLOS, but is of course much more sophisticated).

I am skeptical of the utility of this in Julia because for idiomatic, efficient Julia code, the type of things would be pinned down by the container element type or something similar, so the compiler would focus on a particular branch immediately. Then it is just about comparing values.

Here is some in practice, but I’m not sure if you will feel things like this is convenient.

1 Like

I admit that pattern matching might not be appealing to many, but however I
would always think it kind of helpful to address the complexity of “viewing data”.

It’s okay for you to think pattern matching is an extension to multiple dispatch, making the dispatch capable of dispatching on values instead of merely on types.

If the shape of value is related to the computation logic, you cannot avoid things similar to pattern matching. However if only types matter, I don’t think it worthwhile to use pattern mathing.

Pattern matching is afterall a heavy macro, and yet no library gives good optimizations. It’s a higher level abstraction, but if you don’t think it necessary, I think keeping away from it is okay.

I don’t have much to add here, other than that I would really love to see some form of first-class pattern matching in Julia.
After spending a bunch of time programming in Elixir, I’ve really missed the feature in most other languages. I like to think of it as multiple dispatch for values… and I think most of us are sold on the value of multiple dispatch for types :slightly_smiling_face:.

6 Likes

Matching seems super-useful for expressions, as digging through Expr(...) types is messy, and we already know the “DSL” provided by the parser.

For use on expressions, what do these tools offer over MacroTools? Last time I looked it was not so easy to figure out how to use them, and I ran into issues with capital letters being special (I forget in which package). Is it easy to make a faster drop-in replacement for MacroTools.@capture?

Would you mind giving a small example in code for each problem where the problem shows up with existing pattern-matching libraries?

Also, are you suggesting basically a templating code-generation engine with point 2? Moving down that path seems decidedly unlikely to be beneficial from my point-of-view compared to judicious use of @macroexpand and structuring macros to be relatively comprehensible when expanded. Doing so seems tantamount to saying that pattern-matching isn’t actually a useful idiom for understanding code since code is read much more often than written in general.

MacroTools is already good to use, but some might not like its notation of capturing via underlines(struct a_ ... end). In this regarding pattern matching may help.

Last time I looked it was not so easy to figure out how to use them

If you feel this way it shows that pattern matching is not that successful, because the point of pattern matching is you should be able to use it, understand it without learning about it.

I’m not sure if you’ve ever used Expr or things like :(a + $b) to build expressions, but if so, you should be able to use and understand libraries like MLStyle because deconstructions are almost same as constructions.

a = :(f(1))
b = :(g(1))
@match :($a + $b) begin
     :($c + $d) => 

At here, you deconstruct things like how they’re constructed, so what makes you think it’s not easy? I think your answer could unveil a frequent question that I didn’t realized these years. Thanks in advance.

I ran into issues with capital letters being special (I forget in which package)

It may be MLStyle, and it’s a routine to use capitalized identifiers for constructors/deconstructors.

Is it easy to make a faster drop-in replacement for MacroTools.@capture ?

This depends on what you expect. We’d say using pattern matching is more extensible and customizable, but MacroTools is already a successful encapsulation so using MacroTools might be better. If you want, you can check
this 2 implementation of capture: https://thautwarm.github.io/MLStyle.jl/latest/tutorials/capture/ ,
but the notations are slightly different.

There was a recent complaint on Slack about how annoying it is when multiple dispatch can’t solve a problem (which in this case was referring to code inside a macro). I’d love to have concise beautiful code that is performant for this sort of thing. Lately I’m bumping into some situations where pattern matching might be a key part of the solution but I honestly just don’t know enough about computer science to have any idea if it would be useful.

Looking forward to learning more.

1 Like

Would you mind giving a small example in code for each problem where the problem shows up with existing pattern-matching libraries?

I’d love to, but only the point 3 and 4 are related to actual code.

So other than example in code, I’ll still give some hints for point 1 and 2.

  • Extensibility.
    Say you want a pattern, which can match any structure who has field foo, and view the field foo, like this

    struct Example; foo :: Int end
    @match Example(1) begin
           FieldFoo(1) => ...
    

    FieldFoo(1) is kind of stupid and inconvenient, but if we can make this pattern parametric, like
    HasField{:foo}(1) which means matching a structure who has a field foo valued 1, I bet some guys will like it.

    @match Example("1") begin
           HasField{:foo}(a) => ...
    

    However, we know now no pattern matching libraries provided this.
    Users have their own ideas and their specific demands, and the real world is so impredictive. Hence, the authors of libraries are not supposed to do everything for users, but they should promise the functionalities are full-featured and if needed users can make extensions themselves in a convenient way.
    In MLStyle you can use extensible patterns to implement above HasField pattern, which is called active pattern:

      @active HasField{field}(x) begin
             hasfield(typeof(x), field) ? getfield(x, field) : nothing
      end
    

    However it’s still insufficient, but I don’t want to expand more unless you’d ask.

  • Error reporting

    Just check the README of Rematch.jl, if I make some mistakes in the code, like

     using Rematch
    
     struct Foo
         x::Int64
         y::String
     end
    
     f(x) = @match x begin   # line 8
         _::String => :string
         [a,a,a] => (:all_the_same, a)
         [a,bs...,c] => (:at_least_2, a, bs, c)
         # here if I made a mistake, and typed `&x` and `x`
         Foo(&x, "foo") where x > 1 => :foo
     end
    

    The error will be

    ERROR: LoadError: LoadError: Unrecognized pattern syntax: &x
    in expression starting at xxx.jl:8
    in expression starting at xxx.jl:8
    

    What’s the problem? The problem is the error reporting didn’t lead you to where your made the error, it says line 8, which is the start of match, but the error can raise at line 18, line 80 or line 180, because pattern matching shows more advantages(readability and performance) when the cases grow pretty large.

    Also, if there’s only one case, giving out the reason why match got failed is also useful:

      @match S(x, [y, z]) = S(x, [y, z, 1, 2])
    

    It’s possible to give the message [y, z] not matches [y, z, 1, 2], but no one did this now.

  • Optimizations: as using pattern-matching is now usually very fast and even faster than hand-written, we might not give some examples, but if we know the implementation, we can know the performance can still get improved. The compilation of pattern matching needs some algorithms used in decision trees to simplify the routes, merge the branches and so on. For instance, to match following 2 cases:

       (1, 10)
       (1, "a")
    

    we know the first element of the 2 tuple pattern is the same, and we can merge it, to avoid checking the first element in 2 branches.

    If the shape got more complex, the performance can usually greatly bet the hand-written code because pattern matching compilation automatically finds a faster route to try all cases. If you have interests you could check out some related researches, such as https://dl.acm.org/doi/abs/10.1145/1411304.1411311 .

  • Portability. Pattern matching compilation is time-consuming. You exchange convenience with module loading time, and if you’re debugging, macro expanding will kill your time.

Moving down that path seems decidedly unlikely to be beneficial from my point-of-view compared to judicious use of @macroexpand and structuring macros to be relatively comprehensible when expanded. Doing so seems tantamount to saying that pattern-matching isn’t actually a useful idiom for understanding code since code is read much more often than written in general.

I didn’t mean this. The reason why I want this is just to avoid adding a pattern matching library dependency, and rule out the use and overhead of expanding macros. You should read the code containing macros, but when distributing the libraries, you can distribute the code without macros.

2 Likes