Hello!
I am writing functions which read multiple files of the same type, perform some analysis, and write results to a single object. I am unsure if the current way it is written is “safe” for multithreading. Here is a simple example:
Let’s say you have three .txt files:
file1.txt
file2.txt
file3.txt
Each file contains some text.
My function would look like this:
# Defining the function
function read_that_file(filepaths::Vector{String})
n_files = length(filepaths)
results = Vector{String}(undef, n_files)
Threads.@threads for i in 1:n_files
results[i] = open(io->read(io, String), filepaths[i])
end
return results
end
# Running the function on a vector of filepaths
read_that_file(["file1.txt","file2.txt","file3.txt"])
I’m unsure whether I need locks in this scenario. Multiple threads should never try to access the same file, because they shouldn’t be assigned the same index. Similarly, multiple threads will write to the same results vector, but never at the same index.
The last, kinda weird use case, would be if someone provided the same filename in the filepaths vector, for example:
read_that_file(["file1.txt", "file1.txt"])
At which point those rules would be broken, which might cause an issue?
Thanks!
Gus