I’m involved with a package that provides a Julia wrapper (SCIP.jl) to a C library (https://scip.zib.de/).
In a current pull request (#100), I found that our tests pass most of the time, but not always. This non-deterministic behavior is observed both locally, as well as on Travis.
See for example these two builds: #424.3 passed, #425.3 failed, both on Julia nightly, which correspond to the pr
and pull
runs of Travis, but use identical code, since this PR starts off master
.
When I run the tests locally, I would guess that they fail in about 1 of 10 cases.
I have been able to narrow down the problem to a specific function of the libscip.so
library that I call: SCIPexprCreate
. One of the arguments is a Cdouble
, which is stored in a struct by the C library. When I retrieve it right afterwards, I can (sometimes!) see that the value is different, and looks like unitialized memory. I have set up a MWE (first in C, which always works, and then a faithful(?) recreation in Julia which sometimes shows the failing behavior) at this gist.
At first, I thought that the problem might be caused by Julia GC, and me failing to protect some of the Julia objects that are passed into the ccall
s. But I found that even if I put GC.enable(false)
and GC.enable(true)
around the call of the main(runs)
function in my MWE script, the behavior is non-deterministic.
By the way, it does not matter whether a new Julia process is started several times on this script, or an existing session is reused to include that script several times.
So, finally, my question would be: How would I go about debugging this problem?
I already tried running valgrind
using the recommended flags, but I don’t learn much from its output that is related to my own code.