Using Threads with I/O to processing many files in parallel


#1

Many cases I want to read thousands of files and do some processing, or conversion, on each file. I’m assuming I can use @threads to do this, however below test case crashes Julia (seg fault below).

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    Threads.@threads for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end

@time parallelFile(10000)

Crash output:
signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 184
unknown function (ip: 0x11453452f)
Allocations: 1467627 (Pool: 1466754; Big: 873); GC: 0


#2

I tried this. Seems to work for smaller number of parallelFiles.

Although, running it for 1000, I end up with only about 730-750 files.
Rerunning a few times and I get the crash.

I am wondering if your running into the OS limitation of a maximum number of open files per process. Maybe if the GC is slow to fully close and clear some of the files.


Parallel IO
#3

Threaded IO is not (currently) supported.


#4

I’ve now run the same code but using @parallel - code below and it worked fine. Able to generate 100,000 files in 26 seconds.

function parallelFile(N::Int64)
    dPath="/tmp/Jason/"
    @sync @parallel for i=1:N
        tFile=*(dPath,"file_",@sprintf("%015d",i))
        f=open(tFile,"w")
        @printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
        close(f)
    end
end
addprocs(4)
@time parallelFile(100000)