Using Threads with I/O to processing many files in parallel

Many cases I want to read thousands of files and do some processing, or conversion, on each file. I’m assuming I can use @threads to do this, however below test case crashes Julia (seg fault below).

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    Threads.@threads for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end

@time parallelFile(10000)

Crash output:
signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 184
unknown function (ip: 0x11453452f)
Allocations: 1467627 (Pool: 1466754; Big: 873); GC: 0

I tried this. Seems to work for smaller number of parallelFiles.

Although, running it for 1000, I end up with only about 730-750 files.
Rerunning a few times and I get the crash.

I am wondering if your running into the OS limitation of a maximum number of open files per process. Maybe if the GC is slow to fully close and clear some of the files.

Threaded IO is not (currently) supported.

I’ve now run the same code but using @parallel - code below and it worked fine. Able to generate 100,000 files in 26 seconds.

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    @sync @parallel for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end
addprocs(4)
@time parallelFile(100000)
3 Likes