I had two versions of the same operation in a function. Version 1 works just fine:
action = length(output) > 0 && output > 0 ? 2 : 1
But this other version silently modifies a value which is part of a struct that isn’t even part of the operation. It is in the scope, though.
action = (sum(output) % 2) + 1
When I add forced casts the error goes away:
action = (Int64(sum(output)) % 2) + 1
I caught it by pure luck, just because it was modifying a value that I was monitoring. Any clue why this might happen?
No, there’s no reason for that to be happening, and I’ve never observed anything like that. What is the minimum set of code needed to show this issue, and what is your julia
I’m afraid I cannot provide a minimum set of code for replicating the issue (yet). I will try to find it, but right now this is part of a quite large project, in which multi-threading, asynchronous computation and data race conditions are taking place.
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen 5 5600H with Radeon Graphics
LLVM: libLLVM-11.0.1 (ORCJIT, generic)
I would say it is very likely that the code you quoted above is innocent. The reason for the strange behaviour might be that your changes affect the execution time which in turn affects a race condition somewhere else.
I thought that at the beginning too, but the fact that I could pinpoint the issue to that very exact line and that I could switch it on and off in such a deterministic manner made me think otherwise. Also, notice that I can switch the bug off just by making casts explicit: does that incurr in such a big timing difference? I headed for this forum because I somehow thought that it might have to do with internal (compiler) mechanisms. One colleague of mine pointed out that an incorrect or erratic casting might end up modifying unwanted positions in memory. I don’t know if it’s the case but I assumed here I could find someone who might know a thing about that.
That strange kind of error usually is associated to some incorrect memory access somewhere else in the code, that by chance happens to manifest in this line. I wouldn’t make precipitate changes in the code before having a MWE.
When you change the code so that
action is an
Int64 instead of an
UInt64, that changes the return type of your function, which will cause the calling functions to recompile, which will change the memory location of the machine code, which can affect a bug elsewhere in the code…
Or it could be the fact that the return value is an
UInt64 that directly triggers the bug.
Thanks Per! You actually discovered the issue. It happens that I was using that action for doing a step in a environment. Something like
step!(env, action), and the action argument did not have any type specification, so the
step! function was implemented expecting an
Int64 as input, while it received an
UInt64 and that changed how lots of subsequent operations behaved.
We can say then that it’s solved and it has no relation with internal compiler things.