This is my first time trying this out but here is a problem and what I am trying to do and what I am stuck on trying to do:
Problem: I have many big files (200+, 50 - 100GB each) that I am going to read out. I amidoing some analysis on it and then need to save the results to a file.
Approach: I am using a python library to handle reading the file fine. I am going to parallelize reading and analyzing the files. Then, I am going to write each worker’s results to a separate arrow file and then finally concatenate each file together into one master arrow file.
Where I am stuck: how do I create worker processes in Julia that each read and analyze a file and then write to a file? how do I tell the workers what file to work on? Over all, is this a good approach? Or am I missing some thing?
P.S. I read on Discourse about this and is it as simple as writing something like:
Threads.@thread for f in files do thing in thread end