I have a lengthy code that recently has stopped working on my Linux machine (Julia 0.6.2) although it’s still fine on Windows. I don’t have a MWE; the site of the error moves with small changes to the code. Anyone have experience with this? Any idea how to chase it down?
ERROR: LoadError: ReadOnlyMemoryError()
Stacktrace:
[1] handle_eval!(::conic_solve_cohesive.B_Handle, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Int64) at /u4/vavasis/cohesive/conic_jl/conic28.jl:1312
[2] evalfuncgradhess(::conic_solve_cohesive.B_Handle, ::Array{Float64,1}) at /u4/vavasis/cohesive/conic_jl/conic28.jl:1850
[3] evalfuncgradhess(::conic_solve_cohesive.BarrierObjectiveFH{Array{conic_solve_cohesive.PiecewiseQuadratic{3},1},conic_solve_cohesive.DeltansT6}, ::Array{Float64,1}, ::Bool) at /u4/vavasis/cohesive/conic_jl/conic28.jl:5023
(etc)
Have you tried running in a single process with --check-bounds=yes?
Am I understanding you right that your code loads alright, but you need to call into the compiler during computation with some eval, and this fails at various points? Have you tried rebuilding your sysimage and/or reinstalling julia?
Thanks for the response! I reran the code with the flag --check-bounds=yes, and it did not crash-- it ran normally, at least for the first five minutes. Should I always run with that flag? What is the performance hit from that setting?
There is no call to eval in my code. My code is relatively straightforward Julia. There are a few home-made macros in it, and there is one ccall to SuiteSparse, but that is not where the code crashes. It uses the JLD package, but that is also not where the crash occurs.
Since this version of Julia was installed by the sysadmin, I don’t have adequate privileges to reinstall it myself, but I can ask for a reinstallation to be done during the next few days.
This is weird. The performance hit from that setting very much depends on the code, ranging from neglible (most cases) to catastrophic (preventing simd). Normally, every array access incurs a bound-checking penalty, except if you annotate with @inbounds; the command line flag should override all @inbounds annotations and helps if you mistakenly thought that an access was inbounds.
My guess was that you maybe have an out-of-bounds write that happens to hit mapped non-writable memory; in this case, you should now see an out-of-bounds exception that you can nicely debug. You don’t see this exception, so you either catch it or something crazy is happening.
AFAIK the “default way” of getting the ReadOnlyMemoryError is if you use a non-writeable mmaped array (are you using mmap to read data? are you using multiple julia processes / shared arrays? Are you running multithreaded?). In this case, however, the command line flag should not help at all.
My third guess was that your code does something weird with eval (the error would have been during some code generation), and something got confused with which pages are executable or writable; this guess was coming from the fact that windows and linux behaved differently.
Now I’m stumped. My primary guess would be that your installation is broken; secondary guess would be that your code was weirdly broken all along (race conditions? race conditions with pointer games?), and third guess that you triggered a bug in julia.
Try running it on juliabox, without the flag, in order to see whether a clean linux julia makes trouble?
PS. Maybe try with julia --precompiled=no ? This will slow down startup considerably, but should induce no runtime overhead; if this fixes your problem, then your system image is likely broken.
I ran the code on JuliaBox with no error either. So the best guess is that the local installation of 0.6.2 is somehow flawed. I’ll ask my sysadmin to reinstall, and I’ll repost in a few days if the problem is not solved. Thanks for all the suggestions!
Another possibility is a shared library conflict, which could arise from environment variables like LD_LIBRARY_PATH. Shared libraries are mmapped, and if your Julia was built against a version different from the one that actually gets loaded, pointer or object-size confusion may result in the ReadOnlyMemoryError. This situation may be diagnosed by running lsof -p $PID where $PID is the process id of the troubled Julia, and hunting for anomalies in the output.
What iterative algorithm are you using? Are you only solving overdetermined systems (more rows than columns)?
You might try using @btime (from BenchmarkTools) to see how the allocations grow with dimensions and/or running on a different computer with more/less memory.
I am benchmarking various iterative algorithms, which did not crush. The problem occurs with the blas call via
julia> @btime $x\$y
ERROR: ReadOnlyMemoryError()
x is a wide matrix with 10 rows and 10^7 columns in Float32, which is 400MB memory (for the matrix allocation). The compute node I am running has 200GB Ram, so I guess that it is not a problem with total memory allocations, but with “how” the memory is being allocated for such matrices. For other experiments with @btime I found it needs 50% more ram (for the solution), than the matrix size on average. I tried this via slurm and interactively and the same issue stands.
When I run this again, I got
signal (11): Segmentation fault
in expression starting at REPL[35]:1
scopy_k_SKYLAKEX at /~/Julia/1.5.1-linux-x86_64/bin/../lib/julia/libopenblas64_.so (unknown line)
Allocations: 9815481 (Pool: 9812755; Big: 2726); GC: 10
Segmentation fault
Your problem is underdetermined, so I think there’s an SVD at the backend of the BLAS call. What happens if you do the same problem in Float64? The Float32 <-> Float64 conversions within BLAS (which might happen even if you do not ask for them) could be part of your problem.
yes, it is underdetermined, I forgot to clarify, I append below the exact code. It seems like BLAS crushes with Float64 as well. Yes probably a problem with SVD.
wow! Indeed, without MersenneTwister and with Float64 it is solved (with trivial solution error).
yes, MersenneTwister is the library for random number generation (and reproducibility). However, without Float32, even without MersenneTwister it crushes the whole session.
I tried also StableRNGs, and it worked in Float32 and Float64. Then I retried Float64 with MersenneTwister and it worked. Then I tried another time the same code with MersenneTwister and Float64 and crushes again. It is quite weird, as the initialization is the same.