StaticCompiler: Debugging and calling basic math functions

I am relatively new to julia, but already managed to speed up a piece of code by factors of 2 to 5 compared to my numpy version. Now i need to deploy this code as a shared library on a cluster in a corporate environment without julia installation. I looked into PackageCompiler first, but did not like the mandatory package creation and overhead. It seems extremely messy. A much cleaner solution seems to be StaticCompiler. I got enthusiastic when I managed to run the corresponding example from the docs, but trying to use my own code fails already at the most trivial of lines:

function my_example(a::RefValue{MallocMatrix{Float64}},
						b::RefValue{MallocMatrix{Float64}},
						c::RefValue{MallocVector{ComplexF64}})
	@inbounds for ib in 1:size(b[],1)
		c[][ib] = unsafe_trunc(exp(-2. *pi*im*(a[][1,1]*b[][ib,1] + a[][1,2]*b[][ib,2])))
	end
end

This will compile, but crash when trying to load the so-file in python with ctypes. The error message is “undefined symbol: ijl_apply_generic”. It will not crash if i do something even simpler, like writing some elements of b into c.

I understand that this kind of error occurs when you use “runtime” features like exceptions. However i have no idea what “apply_generic” might be and how to avoid it.

I would be thankful on any help about how to debug this kind of problem in general, and about the nature of my specific problem.

If you haven’t seen it already you probably want to read Successful Static Compilation of Julia Code for use in Production.

This particular problem isn’t discussed there. In general the best way to debug these kinds of issues is to try to simplify the code as much as possible until the problem goes away. @code_llvm (or the function form) is helpful to spot when problematic calls appear.

For this code you can eliminate almost everything without getting rid of apply_generic, until you realize that

julia> unsafe_trunc(1 + 2im)
ERROR: MethodError: no method matching unsafe_trunc(::Complex{Int64})

So well, the problem is that the code won’t run and needs to do dynamic things to tell you that. In other words, first verify that the code works before you try to debug static compilation failures.

1 Like

Thanks for looking into the problem. So if i follow the workflow of the example code where compilation is in the same code file as the function itself it would be best to add a little test call which will crash before even starting the compilation if any “normal” errors occur.

The reason that i did not do this here was that i do not know how to generate the appropriate “reference-to-malloc-vector” arguments to call the function from within julia for testing.

Actually i added the “unsafe trunk” because before that there was another unresolved symbol “ijl_throw”. Seemingly the exp-call wants to throw an exception and this will not work with static compilation either. Unsafe_trunc was the suggested solution for floats i think. So maybe the real / next question is whether there is any other way to circumvent the throwing of exceptions in this case…

That is something I do discuss in my post, see the part introducing CustomVector. (There it’s assumed that the code is going to be called from Python and that all needed memory can be allocated from Python and passed in the call to Julia.)

Now I’m lost. unsafe_trunc is an exception-free alternative to floor or trunc, not a way to conjure away exceptions from unrelated functions.

When I do

code_llvm(exp, ComplexF64)

I see no calls to ijl_throw, so I suspect the problem was something else than complex exp.

I have reduced my code more to see more clearly what the problem is. It now is also short enough to share the whole working/nonworking example.

Julia core test.jl (other files are listed below)

using StaticTools
using Base: RefValue
myf(a::ComplexF64) = a*a
#myf(a::ComplexF64) = cos(a)+im*sin(a) #will also NOT work
function fun( a::RefValue{MallocVector{Float64}}, b::RefValue{MallocVector{ComplexF64}} )
	@inbounds for ia in 1:size(a[],1)
		@inbounds for ib in 1:size(b[],1)
			b[][ib] += myf(1im*a[][ia])    #does work
			#b[][ib] += exp(1im*a[][ia])  #does NOT work
		end
	end
end

It is based mostly on the .so-example from the StaticTools documentation.

By switching out a single line i can see that the problem is definitely the exp function (or also cos+im*sin) since it works when i call my simple “myf” complex valued function instead of exp. This also shows, that i use the right syntax to pass the arguments from python and to dereference a with an aditional .

I read in the StaticLib docs that there is a problem with “functions that don’t want to inline”. This would be another guess of mine concerning the problem. Maybe it is implemented recursively. If that is the problem, then maybe i could call some c-stdlib version of exp…

Edit: For sin and cos i see both “@noinline” and “throw” in trig.jl

Edit2: With exp and more or less similiar for cis or manual cos+im*sin i see this in the nm output:
U ijl_invoke
U ijl_throw
0000000000000510 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000001240 t julia_exp_677
00000000000010b0 T julia_integrate_poly
0000000000000660 t julia_paynehanek_673
0000000000000940 t julia_sincos_669
0000000000001360 t julia_sincos_domain_error_675

The “makefile” / build.jl

using StaticTools
using Base: RefValue
using StaticCompiler
include("test.jl")
tt = ( RefValue{MallocVector{Float64}}, RefValue{MallocVector{ComplexF64}} )
compile_shlib(fun, tt, "./", "fun", filename="libtest")

The python file

import ctypes as ct
import numpy as np

class MallocVector(ct.Structure):
	_fields_ = [("pointer", ct.c_void_p),
				("length", ct.c_int64),
				("s1", ct.c_int64)]
def mmptr(A):
	ptr = A.ctypes.data_as(ct.c_void_p)
	a = MallocVector(ptr, ct.c_int64(A.size), ct.c_int64(A.shape[0]))
	return ct.byref(a)

lib = ct.CDLL("./libtest.so")
a =  np.linspace(-np.pi,np.pi) 
b = np.zeros(3).astype(complex)
lib.julia_fun(mmptr(a),mmptr(b))
print(b)

Ah, right, exceptions deeper down in the call chain. The bad news is that you have to work around this. The good news is that you can.

julia> sincos(Inf)
ERROR: DomainError with Inf:
sincos(x) is only defined for finite x.
Stacktrace:
 [1] sincos_domain_error(x::Float64)
   @ Base.Math ./special/trig.jl:164
 [2] sincos(x::Float64)
   @ Base.Math ./special/trig.jl:183
 [3] top-level scope
   @ REPL[1]:1

julia> code_llvm(sincos, (Float64,))
[...Long output, including a call to j_sincos_domain_error_100...]

Reading the code we can see

function sincos(x::T) where T<:Union{Float32, Float64}
    if abs(x) < T(pi)/4
        if x == zero(T)
            return x, one(T)
        end
        return sincos_kernel(x)
    elseif isnan(x)
        return T(NaN), T(NaN)
    elseif isinf(x)
        sincos_domain_error(x)
    end
    [...]
end

So, let’s redefine it without the corner cases:

julia> function Base.Math.sincos(x::T) where T<:Union{Float32, Float64}
           if abs(x) < T(pi)/4
               if x == zero(T)
                   return x, one(T)
               end
               return Base.Math.sincos_kernel(x)
           end
           n, y = Base.Math.rem_pio2_kernel(x)
           n = n&3
           # calculate both kernels at the reduced y...
           si, co = Base.Math.sincos_kernel(y)
           # ... and use the same selection scheme as above: (sin, cos, -sin, -cos) for
           # for sin and (cos, -sin, -cos, sin) for cos
           if n == 0
               return si, co
           elseif n == 1
               return co, -si
           elseif n == 2
               return -si, -co
           else
               return -co, si
           end
       end

Now we get a nonsensical result

julia> sincos(Inf)
(-0.9307036206040147, 0.36577420712042863)

but can no longer see any call to j_sincos_domain_error_100 in the output of

julia> code_llvm(sincos, (Float64,))

Comments:

  • I don’t really recommend redefining the function in Base, even if you can. Better make your own variants of the functions you need, starting from the code in Base and modifying where needed.
  • There are probably more similar workarounds needed, it will be a detective work to locate them all.
  • There’s a reason PackageCompiler is recommended over StaticCompiler; it’s way more mature. But if you have the persistence (and sufficiently simple code) to work around all issues, you will be rewarded with a very small and nice library to deploy.

Okay, fair enough. I really like the idea of true static compilation of Julia code though and i hope that we will see more mature versions of the StaticCompiler in the future. Off course, coming from Python it is kind of unfair to complain that you need the julia runtime for so many things, when you would also need the python interpreter for python. Still it seems like a waste to have a language which is “kind of” compiled and then you cannot do proper AOT compilation.

I will try the route of replacing the exp function with an exception-free variety and see whether i get frustrated. I have a bit of hope, since the complex exponential is my only library call, the remainder of my real code consists of basic arithmetic and loops…

Okay, so your solution did indeed work for the simple testcase :-). On top, i learned that i can use cis instead of exp and that this will supposedly be faster too (i only have exp(i*real_a) calls).

I also found another variety of the cis-function in this forum post which is less extra code and allegedly faster because it calls some optimized library.

const A = Ref{Cdouble}()
const B = Ref{Cdouble}()
function cis2_(x::Float64)
    ccall((:sincos, Base.Math.libm), Cvoid, (Cdouble, Ptr{Cdouble}, Ptr{Cdouble}), x, A, B)
    return Complex(B, A)
end

Sadly this will simply give me a segfault when compiled (it works inside julia). I suspect that the calls to Ref{CDouble}() amount to GC memory allocation calls. I would have thought that this could also be circumvented, since A and B are no arrays and it should be possible to pass memories to some scalar value. Sadly i did not manage to implement any version of my idea which would work even in normal julia (syntax errors).

that hasn’t been faster since 2019. our cis is now pretty good.

If I’m not mistaken, ccall is not compatible with static compilation at this point.

Okay, then i am happy to roll with the modified sincos.

Another thing i just learned is that broadcasting does not seem to work at least in the case that i tried:

#works (c is as long as a):
@inbounds for ia in 1:size(a[],1)
	c[][ia] = cis_(a[][ia])
end
#does not work, ijl_throw again: @inbounds c[] .= cis_.(a[])

No?

Typo. now → not