Jl_call - function call latency

Hello,
My first benchmarks show that calling the jl_call function comes with a latency of 100 nanoseconds (measured by calling an empty Julia function) on a Xeon Gold processor.
Is it expected ?
In Julia, BenchmarksTool gives a computation time of 50 ns for the julia function I intend to embed so paying an extra 100 ns in the C code is annoying. Note that my loop is a few microsecs so I am looking at any performance improvements.

Have you tried calling and timing it twice to account for compilation latency?

I wouldn’t use jl_call in a low-latency situation, since (I think?) that will still do dynamic dispatch. Instead, if possible you should simple ask Julia for a C function pointer with the specific type signature that you want to call. Then the latency will be the same as for any other C function pointer.

See the paragraph on @cfunction in the Julia embedding manual.

I submitted a documentation PR to clarify this: additional clarification on cfunction embedding by stevengj · Pull Request #52315 · JuliaLang/julia · GitHub

3 Likes

Yes, the benchmark does a warmup call so that is does not include any timing due to JAOT compilation. The 100ns is the mean duration over a million jl_calls.

ok I will investigate the @cfunction solution. Thank you.

By using @cfunction I am getting better performance. Thanks for the tip. Had a bit of hard time finding the right way of passing Arrays without getting heap memory allocation. Solution found by using a combination of jl_alloc_array_[1,2]d to allocate the arrays and a cfunction defined as follow:

jl_value_t *cfunc = jl_eval_string("@cfunction(gemv!, Cvoid, (Ref{Array{Cdouble,1}}, Ref{Array{Cdouble,2}}, Ref{Array{Cdouble,1}}))");

the “jl_call” curve seems a bit suspicious. Will investigate why its behavior is different.