Error opening too many connections

I am trying to write a parallel algorithm to write to many files. Is there a way to detect how many connections I can open? Or a way to rate limit my code?

ios = open.(
"file".*(1:3000),
Ref("w"),
)

gives error

ERROR: SystemError: opening file "a.jdf\\x2045": Too many open files
Stacktrace:
 [1] #systemerror#44(::Nothing, ::typeof(systemerror), ::String, ::Bool) at .\error.jl:134
 [2] systemerror at .\error.jl:134 [inlined]
 [3] #open#516(::Nothing, ::Nothing, ::Nothing, ::Bool, ::Nothing, ::typeof(open), ::String) at .\iostream.jl:254
 [4] #open at .\none:0 [inlined]
 [5] open(::String, ::String) at .\iostream.jl:310
 [6] _broadcast_getindex_evalf at .\broadcast.jl:630 [inlined]
 [7] _broadcast_getindex at .\broadcast.jl:603 [inlined]
 [8] _getindex at .\broadcast.jl:627 [inlined]
 [9] _broadcast_getindex at .\broadcast.jl:602 [inlined]
 [10] getindex at .\broadcast.jl:563 [inlined]
 [11] macro expansion at .\broadcast.jl:909 [inlined]
 [12] macro expansion at .\simdloop.jl:77 [inlined]
 [13] copyto! at .\broadcast.jl:908 [inlined]
 [14] copyto! at .\broadcast.jl:863 [inlined]
 [15] copy at .\broadcast.jl:839 [inlined]
 [16] materialize at .\broadcast.jl:819 [inlined]
 [17] savejdf(::String, ::DataFrame) at c:\git\JDF\src\JDF.jl:64
 [18] top-level scope at none:0

I feel like this question is too open ended. There is a finite number of files that can be open at the same time per process. Sorry there is no way around that. Now the question is what are you doing with those open files.

Do you need them open at once because you are reading/writing to them simultaneously? If you do, ouch, you are probably out of luck and would have to implement something that opens and closes the files “as needed” and the performance is probably going to suck.

Or can you “batch” your process so it only processes N files at a time. Then you can probably work something out. But then you would have to update your code to process the data in these batches.

If you know the max number of files you will want open you might be able to just change some limits.
On linux there is there is a normal limit of 1024 open files. However that can be changed. On windows (which you appear to be on) I did find:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019

Which seems to have a default of 512 and a hard limit of 8,192 files, but you might be able to make a ccall to it to change it…maybe.

1 Like

On linux and mac os you can see limit with

> ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       46553
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  16384
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 46553
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

And you can set limit of max open files with

sudo ulimit -n <new limit>

Are you sure that writing 3000 files at the same time is a good idea? I know several systems that require an increase in ulimits -n for work, but they do not write to thousands of files at the same time, they keep open descriptors for their needs, especially for reading

As @waralex says - what is the use case here? It sounds interesting!

I don’t need 3000 actually. Just writing out columns of a DataFrame out. So can limit to how many threads i have