I tried to process images in parallel but the load operation results in a Segmentation Fault. Is loading images not thread safe?
using Images, FileIO
Threads.@threads for path in readdir("img"; join=true)
load(path)
end
The img directory contains only PNG images. Running the above script with a single thread works, but with multiple threads (julia --threads 4) it gives me a segmentation fault:
signal (11): Segmentation fault
in expression starting at none:1
in expression starting at none:1
unknown function (ip: 0x7fbd79b0d298)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0e80a)
unknown function (ip: 0x7fbd79b0d294)
unknown function (ip: 0x7fbd79b0d725)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0c8b7)
unknown function (ip: 0x7fbd79b0e80a)
unknown function (ip: 0x7fbd79b0d766)
unknown function (ip: 0x7fbd79b0d725)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0c8b7)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0d766)
unknown function (ip: 0x7fbd79b0e80a)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0d725)
unknown function (ip: 0x7fbd79b0d32d)
unknown function (ip: 0x7fbd79b0d9ec)
unknown function (ip: 0x7fbd79b0e80a)
unknown function (ip: 0x7fbd79b0dafc)
unknown function (ip: 0x7fbd79b0d725)
unknown function (ip: 0x7fbd79b0ea0a)
unknown function (ip: 0x7fbd79b0d9ec)
unknown function (ip: 0x7fbd79b12347)
unknown function (ip: 0x7fbd79b0dafc)
jl_restore_incremental at /usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7fbd79b0ea0a)
unknown function (ip: 0x7fbd79b12347)
jl_restore_incremental at /usr/bin/../lib/libjulia.so.1 (unknown line)
_include_from_serialized at ./loading.jl:681
_require_search_from_serialized at ./loading.jl:782
_require_search_from_serialized at ./loading.jl:782
Errors encountered while loading "/path/img/000811.png".
All errors:Errors encountered while loading "/path/img/000001.png".
===========================================Errors encountered while loading "/path/img/001081.png".
All errors:
===========================================
All errors:
===========================================
_require at ./loading.jl:1007
_require at ./loading.jl:1007
require at ./loading.jl:928
require at ./loading.jl:928
require at ./loading.jl:923
require at ./loading.jl:923
unknown function (ip: 0x7fbd79b1cac0)
unknown function (ip: 0x7fbd79b1cac0)
unknown function (ip: 0x7fbd79b1e16e)
unknown function (ip: 0x7fbd79b1e16e)
jl_toplevel_eval_in at /usr/bin/../lib/libjulia.so.1 (unknown line)
jl_toplevel_eval_in at /usr/bin/../lib/libjulia.so.1 (unknown line)
eval at ./boot.jl:331 [inlined]
topimport at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:13
eval at ./boot.jl:331 [inlined]
topimport at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:13
checked_import at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:30
checked_import at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:30
#load#28 at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:195
#load#28 at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:195
load at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:184 [inlined]
#load#14 at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133 [inlined]
load at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133 [inlined]
macro expansion at /path/open_images.jl:4 [inlined]
#3#threadsfor_fun at ./threadingconstructs.jl:81
load at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:184 [inlined]
#load#14 at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133 [inlined]
load at /home/someuser/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133 [inlined]
macro expansion at /path/open_images.jl:4 [inlined]
#3#threadsfor_fun at ./threadingconstructs.jl:81
#3#threadsfor_fun at ./threadingconstructs.jl:48
#3#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x7fbd49e314ac)
unknown function (ip: 0x7fbd49e314ac)
unknown function (ip: 0x7fbd79b053b9)
unknown function (ip: 0x7fbd79b053b9)
unknown function (ip: (nil))
nknown function (ip: (nil))
Allocations: 11229080 (Pool: 11225888; Big: 3192); GC: 8
Allocations: 11229080 (Pool: 11225888; Big: 3192); GC: 8
Segmentation fault (core dumped)
Perhaps you could give my new actor library YAActL a try (see the announcement):
The following installs a very simple file server:
julia> using YAActL, FileIO
julia> fs = Actor(load)
Channel{YAActL.Message}(sz_max:32,sz_curr:0)
then you can do:
julia> @threads for file in readdir("img"; join=true)
img = call!(fs, file)
# then do the processing in parallel
end
which is the same as your code above, but works (without lock). It opens the files sequentially and serves the content over the fs channel. Then the threads can compute in parallel. This is threadsafe by design.
The loading is done in one thread (the one the actor resides on). It serves the images over the fs channel to the threads that called it.
The threads then do the processing of the images (in your case 4 at a time) in parallel.
This makes only sense if the processing takes significantly longer than the loading. Otherwise you won’t gain much by multithreading. See Amdahl’s law. The reading of the files can’t be parallized. It’s the same with a lock. It creates a queue which makes tasks wait until they can read the file.
You can parallelize all operations after the file access since only the reading needs to happen in a single thread. But this must be implemented in a library or by the user. Amdahl’s law suggests to keep the part that cannot parallelized as small as possible.
That’s probably true in most situations, including mine, but there’s still the decoding step that could be parallelized. I guess I could separate the loading and decoding, but that doesn’t seem to be very easy since load only accepts a filename or a stream.
Anyway I was able to run my process. And surprisingly saving the resulting images in multiple threads does work.
I perhaps should have written that it is not practical:
Hardware file access works sequentially. If you allow parallel cores concurrently access the file system, then somewhere in between you have to manage the switching, to save and restore permanently status for each concurrent file access. This is possible but involves considerable complexity and may be error prone.
So no reason not to try to split the load over a couple of threads then, in particular as there’s decoding to do as well. It seems to me like there’s some thread unsafety in FileIO because reading directly with PNGFiles works fine for me.
using PNGFiles
filenames = filter(x->endswith(x, ".png"), readdir("."; join=true));
x = Vector{Any}(undef, length(filenames));
@sync for (i, path) in enumerate(filenames)
Threads.@spawn begin
x[i] = PNGFiles.load(path)
end
end
This is probably not a good way to get performance but at least there should be no fundamental problem with multithreading it.
Update: When I try this with 100 png files of size 4096x4096 and 8 threads I get a factor 4 speedup over sequential read. Seems fairly decent.
FileIO and ImageIO do lazy loading of the underlying file IO packages, so what may be happening is multiple threads trying to load the package at the same time, if it hasn’t happened already in the session.
The first example may work if the underlying IO package is first loaded on the main thread, either by invoking a single image load first, or by explicitly using ImageIO, PNGFiles or ImageMagick etc
using Images, FileIO, ImageIO, PNGFiles
Threads.@threads for path in readdir("img"; join=true)
load(path)
end
Note that if the file format changes, it may try to load different packages and hit the same problem.