Parallelizing Multiple Workers for File Operations

TheCedarPrince · July 8, 2022, 2:46pm

Hi all,

This is my first time trying this out but here is a problem and what I am trying to do and what I am stuck on trying to do:

Problem: I have many big files (200+, 50 - 100GB each) that I am going to read out. I amidoing some analysis on it and then need to save the results to a file.

Approach: I am using a python library to handle reading the file fine. I am going to parallelize reading and analyzing the files. Then, I am going to write each worker’s results to a separate arrow file and then finally concatenate each file together into one master arrow file.

Where I am stuck: how do I create worker processes in Julia that each read and analyze a file and then write to a file? how do I tell the workers what file to work on? Over all, is this a good approach? Or am I missing some thing?

Thank you!

P.S. I read on Discourse about this and is it as simple as writing something like:

Threads.@thread for f in files
    do thing in thread
end

?

~ tcp

DrChainsaw · July 8, 2022, 4:41pm

GitHub - shashi/FileTrees.jl: Parallel file processing made easy does this for you if you just want to get going quickly. Just add threads or workers and use the lazy flag.

Topic		Replies	Views
Parallel Processing File New to Julia question	3	1789	August 29, 2018
Using Threads with I/O to processing many files in parallel New to Julia	3	936	December 23, 2016
A problem: writing results to a file in parallel New to Julia question , parallel , io	14	1396	December 15, 2023
Reading and processing Data files concurrently Data parallel	18	3810	September 20, 2017
Reading multiple text files from a directory Performance	1	1078	October 19, 2020

Parallelizing Multiple Workers for File Operations

Related topics