Multiple dispatch "default case"?

frylock · August 30, 2023, 1:52pm

I’m asking because I think I am probably going about this the wrong way. I have a “configuration file” with sections that I wanted to parse and store in a dictionary. I’m reading the sections like this:

function read_section(file_handle)
    name = String(read(file_handle, 32))
    s_len = ntoh(read(file_handle, UInt32))
    data = read(file_handle, s_len)

    parse_section(Val(Symbol(name)), IOBuffer(data))
end

The section parsers look like this:

function parse_section(::Val{:type1}, data)
    # ... do stuff
end

function parse_section(::Val{:type2}, data)
    # ... do different stuff
end

However when I get a section name that isn’t coded as a parse_section(::Val{:something_i_didnt_code}, data) … I get an error. I could use a try/catch block … or … perhaps I am just doing things wrong?

mbauman · August 30, 2023, 2:17pm

That’d be parse_section(::Val, data) = nothing. But I’d also not use dispatch for this, and instead use a sequence of ifs or the like. But really, I’d try to use a standard file format (TOML or the like) if that’s at all a possibility.

abraemer · August 30, 2023, 2:18pm

I mean you can just add a default method that is unconstrained, f.e. parse_section(x, data) = error("Unknown section: $x") or maybe like parse_section(::Val{T}, data) where T = error("Unknown section: $T")

frylock · August 30, 2023, 2:22pm

I started out with a sequence of if-s but wanted to explore the modularity of having separate functions. I guess with the if-s that the else block would handle the missing case.

Is there a bad performance penalty for using the parse_section(::Val{}, data) overloads? I guess I could code up the same with an if block and try it out myself …

I’m willing to rewrite it all, but I wanted to try out some different ways so that I could get some hands-on experience. I find that I often don’t learn well from documentation - I have to code up something and evaluate it “live” so to speak.

abraemer · August 30, 2023, 2:36pm

Quick benchmark:

using BenchmarkTools
f(x) = f(Val(x)) # convenience method
f(::Val) = return 0 # default
f(::Val{:x}) = return 1
f(::Val{:y}) = return 2
f(::Val{:z}) = return 3

function g(val)
    if val == :x
        return 1
    elseif val == :y
        return 2
    elseif val == :z
        return 3
    else
        return 0
    end
end

testvec = [:x, :y, :z, :y, :x, :z]

julia> @btime f.($testvec)
  1.956 μs (8 allocations: 400 bytes)
julia> @btime f.($(Val.(testvec)))
  429.236 ns (1 allocation: 128 bytes)
julia> @btime g.($testvec)
  38.903 ns (1 allocation: 128 bytes)

So without using the convenience wrapper you are an order of magnitude slower. When also considering the overhead of wrapping the value in Val you are more like 50x slower. Here Julia can’t infer which function to dispatch to so the allocation of the Val wrapper cannot be avoided. Whether that timing difference matters in your usecase is for you to decide.

Henrique_Becker · August 30, 2023, 3:38pm

Your functions just return a value. Their functions will operate over a file handle (basically the slowest primitive thing you can do in any language). I think this is not really representative. What you are benchmarking is just the overhead of multiple dispatch, so “how many times slower” is not relevant, but if the absolute difference is significant in relation to the real cost of each of their functions.

ToucheSir · August 31, 2023, 12:29am

That’s what GitHub - ztangent/ValSplit.jl: Compile away dynamic dispatch on Val-typed arguments via value-splitting. was designed for. I haven’t read this thread in enough detail to say if it’d make sense over just using an if-else chain, but it exists.

kevbonham · September 1, 2023, 2:43am

I posted about this in the #appreciation channel on slack about this pattern a few weeks ago, because to me, it feels really elegant

BUT i got schooled (very nicely) by Tim Holy and Claire Foster about the problem - yes, there’s a big performance penalty (or at least there can be, I think). If you have a slack account, you can read their points here, and I’ll quote below (with permission) so it doesn’t get lost on the slack hole. For context, I had made a function called fixcol() with different methods dispatching on Val types based on the name of the column, very similar to what you have.

In response to me saying it was still fast enough for my purposes:

tim.holy:

Right, performance is only a concern some of the time. The only other issue to keep in your head is the risk of invalidation and its impact on TTFX: your implementation is not inferrable so is at risk for invalidation, but one that branches on values might not be. In that case, even if you use a precompile workload to reduce TTFX, combining your package with something else might conceivably result in your code needing to be recompiled. Again, only a major issue in packages that are widely used in combinations with others, but worth keeping in mind.

In other words, if you’re writing an implementation that is intended to be “library-grade” you’re best off avoiding the fixcol pattern in most cases (exception: interfaces that are designed to be extended). But it’s perfectly fine for personal projects as long as you’re good with the performance.

Then later, from Claire:

c42f:

To expand a little on what Tim said… I’d generally avoid pushing “data” like the column name n into the type domain (as you do this in Val{Symbol(n)}()) for various reasons. One reason is that it generates a number of types which is bounded not based on the program, but on the input data in the table. The Julia runtime generates internal data structures related to types and the methods they’re passed to, not all of which can be garbage collected. So putting data in the type domain can sometimes generate an unbounded amount of uncollectible garbage.

Another reason is it’s just hard for the compiler to reason about if you ever want efficiency. The compiler would have to see that fixcol is semantically an if-else statement where n is wrapped in order to put it through the (very complicated) generic function dispatch machinery, only to be effectively unwrapped again. I doubt the compiler can reason about this. Relatedly, I think involving dispatch is harder to reason about for a random person coming across such code, just because there’s that world of possibility (one has to read carefully to realise you’re only using a small part of the dispatch machinery)

So on a personal level I’d be happier to both write and read a small if-else

But regardless of all that, I can see how this can feels neat. If you find it expressive and readable and it makes you happy … then I think you should go for it

frylock · September 1, 2023, 12:17pm

Thanks for re-posting that! And thanks also to Tim & Claire for comments on your question!
I did find the dispatch-on-Val fun to try out, and I’m pretty sure in my use-case it is not a problem - but in general I like to take the route of efficiency. I think for “real” work, based on the input here, I would either use if-else or ValSplit.

I’m sure like a lot of people here, that neatness of Julia doesn’t really become apparent until you try to code something nontrivial and new.

Topic		Replies	Views
Article about Multiple Dispatch Community multidispatch	7	1930	August 12, 2022
How do I make a dispatch table using multiple dispatch instead of Dict? General Usage	20	1351	May 9, 2021
Method dispatch New to Julia	6	536	December 14, 2020
Dispatch on content from file General Usage question , design	6	757	October 20, 2018
Using Multiple Dispatch with Keyword Arguments New to Julia question , multidispatch	4	1618	September 16, 2022

Multiple dispatch "default case"?

Related topics