Try-catch statement really slow even if catch is never executed

HashBrown · December 16, 2022, 3:20pm

Consider the following functions that add the elements of a vector:

slow(v) = begin
    s = v |> eltype |> zero
    for ele in v
        try 
            s += ele
        catch
            rethrow()
        end
    end
    s
end

fast(v) = begin
    s = v |> eltype |> zero
    for ele in v
        s += ele
    end
    s
end

We can time these functions with:

using BenchmarkTools

const v = rand(Int64, 1_000_000)

@btime slow(v) # 14.6 milliseconds
@btime fast(v) # 155.9 microseconds

slow is 100 times slower even though the catch clause is never executed!

Looking on the internet other languages like C# and Java don’t have this problem (I did not check for myself so take with a grain of salt). Is there any optimization to be done here to fix this? Otherwise try-catch clauses become almost impossible to use for a very high performance applications that may very rarely need to execute the slow catch clause.

jling · December 16, 2022, 3:25pm

I wonder if this is a regression?

lmiq · December 16, 2022, 3:32pm

the equivalent of that example is putting the try outside the loop:

slow(v) = begin
    s = v |> eltype |> zero
    try
        for ele in v
            s += ele
        end
    catch
        rethrow()
    end
    s
end

HashBrown · December 16, 2022, 3:35pm

For my use case the try would need to be in an inner loop. Actually the try is inside a function that is called while processing streams of data. In the case of an error that function resets any state that was altered to provide a transaction like interface.

So I guess we should limit our discussion to poor performance of try-catch inside loops.

lmiq · December 16, 2022, 3:42pm

From your benchmark it seems that the try/catch has an overhead of 14.6 nanoseconds per try*. Are you sure that really is important in your case? I have the impression that that cannot get any much faster.

14.6 milliseconds = 14.6e6 nanoseconds -> 14.6 nanoseconds per try.

HashBrown · December 16, 2022, 3:51pm

I am trying to build a DAG based workflow abstraction that is nearly as performant as native code.

Example use might look like:

# Control
control_sum_benchmark(items) = begin
    v = Vector{Float64}(undef, length(items))
    for (i, item) in enumerate(items)
        v[i] = item[1] + item[2]
    end
    v
end


# Experiment
fs = FlowSource(Tuple{Int64, Float64})
x, y = Split(fs)
s = Sum(x, y)
materialize(s)

flow_all_items!(fs, items) = begin
    for item in items
        flow!(fs, item)
    end
end

# Setup
N = 1_000_000
items1 = rand(Int64, N)
items2 = rand(Float64, N)
items = [i for i in zip(items1, items2)]

# Benchmark
@btime flow_all_items!(fs, items) 
@btime control_sum_benchmark(items);

Obviously real use cases would use nodes more complicated than a simple Sum. Great care has been taken to achieve such performance. The transaction like feature was a nice addition, but would destroy the native performance promise. (Slows this code down about 50 times over)

So basically if flow! takes about 15ns anyways per item (possible with very large and complex workflows) then the try statement won’t matter. But otherwise all the time is wasted on building a try-catch which seems unreasonably more expensive than a comparable if-else

lmiq · December 16, 2022, 3:57pm

There seems to be a significant overhead on catch statements on C++, from this discussion: https://pspdfkit.com/blog/2020/performance-overhead-of-exceptions-in-cpp/

I guess one has to write custom exception handling functions using as much information as possible from the type of data one has, to overcome this.

Sukera · December 16, 2022, 3:59pm

I haven’t checked, but I’d be willing to bet that this is due to a difference in vectorization. I’d assume the try/catch doesn’t allow the loop to be vectorized, while the “naive” version does.

The usual recommendation, irrespective of language, is “don’t do exception management in a hot loop” - be aware that both C# and Java are (for this purpose) static languages, so they can get a lot more guarantees out of code that doesn’t throw. Very high performance algorithms tend to check the necessary conditions before entering the hot loop, instead of relying on a compiler optimization to do that for them.

lmiq · December 16, 2022, 4:05pm

If it takes less than that we are talking about a few CPU operations, where alignment and vectorization may take place. In that case, as pointed above, one would really like to avoid testing inside the loop.

HashBrown · December 16, 2022, 4:08pm

Yes. Perhaps it is just best to forget about performance on such a trivial example and focus on more practical use cases where workflows would be larger and more complex (where performance would remain high even with the try-catch).

Topic		Replies	Views
Are exceptions in Julia "Zero Cost" Performance	12	3172	August 28, 2021
Is there any speed improvement from wrapping invokelatest in a try-catch? General Usage question	2	878	December 24, 2017
Switch find-zero method if convergence fails without try-catch General Usage performance , exception , roots	2	1220	November 20, 2020
Is there a better way to do error handling on Julia than try/catch? General Usage	19	10423	March 15, 2019
Nested functions pros and cons General Usage	24	14442	January 11, 2019

Try-catch statement really slow even if catch is never executed

Related topics