Using Threads with I/O to processing many files in parallel

Jason_McConochie · December 23, 2016, 9:41am

Many cases I want to read thousands of files and do some processing, or conversion, on each file. I’m assuming I can use @threads to do this, however below test case crashes Julia (seg fault below).

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    Threads.@threads for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end

@time parallelFile(10000)

Crash output:
signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 184
unknown function (ip: 0x11453452f)
Allocations: 1467627 (Pool: 1466754; Big: 873); GC: 0

lwabeke · December 23, 2016, 11:48am

I tried this. Seems to work for smaller number of parallelFiles.

Although, running it for 1000, I end up with only about 730-750 files.
Rerunning a few times and I get the crash.

I am wondering if your running into the OS limitation of a maximum number of open files per process. Maybe if the GC is slow to fully close and clear some of the files.

yuyichao · December 23, 2016, 2:50pm

Threaded IO is not (currently) supported.

Jason_McConochie · December 23, 2016, 2:52pm

I’ve now run the same code but using @parallel - code below and it worked fine. Able to generate 100,000 files in 26 seconds.

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    @sync @parallel for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end
addprocs(4)
@time parallelFile(100000)

Topic		Replies	Views
Limiting the maximum number of parallel threads with @spawn, as with @threads General Usage parallel , multithreading	2	694	May 8, 2020
Reading and processing Data files concurrently Data parallel	18	3834	September 20, 2017
Parallel Processing File New to Julia question	3	1798	August 29, 2018
Segfault while loading images in multiple threads New to Julia images , multithreading	14	1377	October 25, 2020
How to parallelize this (simple) function (?) General Usage question	7	1321	December 15, 2016

Using Threads with I/O to processing many files in parallel

Related topics