Many cases I want to read thousands of files and do some processing, or conversion, on each file. I’m assuming I can use @threads to do this, however below test case crashes Julia (seg fault below).
function parallelFile(N::Int64)
dPath="/tmp/Jason/"
Threads.@threads for i=1:N
tFile=*(dPath,"file_",@sprintf("%015d",i))
f=open(tFile,"w")
@printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
close(f)
end
end
@time parallelFile(10000)
Crash output:
signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 184
unknown function (ip: 0x11453452f)
Allocations: 1467627 (Pool: 1466754; Big: 873); GC: 0
I tried this. Seems to work for smaller number of parallelFiles.
Although, running it for 1000, I end up with only about 730-750 files.
Rerunning a few times and I get the crash.
I am wondering if your running into the OS limitation of a maximum number of open files per process. Maybe if the GC is slow to fully close and clear some of the files.
I’ve now run the same code but using @parallel - code below and it worked fine. Able to generate 100,000 files in 26 seconds.
function parallelFile(N::Int64)
dPath="/tmp/Jason/"
@sync @parallel for i=1:N
tFile=*(dPath,"file_",@sprintf("%015d",i))
f=open(tFile,"w")
@printf(f,"%s\n",Dates.format(now(),"SS.ssss"))
close(f)
end
end
addprocs(4)
@time parallelFile(100000)