Creating an i/o pipeline which feeds a while loop using multiple cores

no code written YET but getting there. So I am having a little performance anxiety and I think it’s because I’m not seeing the big picture yet. Could someone point me at ANY Julia modules I should look at for the following.

I have prepared a text file of 500mb that has mixed lines in it:

  1. all lines are about 300 characters long
  2. comment lines
  3. comma separated lines
  4. space separated lines
  5. lines can contain floats and integers

I want to pipe the text file into an I/O stream into a memory buffer.

another way to look at it

using the CLI pipe a text file into a julia process.
the julia process sees the bytes flow in and puts them into a memory structure
the memory structure fills up until it has enough work to facilitate a worker
the worker looks for a place to work and removes the bytes (work) from the memory structure
this process is repeated until all the bytes are processed.

the parsers output to a queue which feeds other processes.

Not exactly what you are looking for, but in the same vein
https://github.com/baggepinnen/LengthChannels.jl

@baggepinnen
thanks for the lead, first glance it doesn’t seem a fit BUT it’s a GREAT example. I’ll try to clean up my needs but the concept is:

using the CLI pipe a text file into a julia process.
the julia process sees the bytes flow in and puts them into a memory structure
the memory structure fills up until it has enough work to facilitate a worker
the worker looks for a place to work and removes the bytes (work) from the memory structure
this process is repeated until all the bytes are processed.

does that clear it up? if so I will update the question.

I would recommend attempting a first pass at writing the code for this first, and then optimize it after as needed. It’s also hard for others to provide help if the code doesn’t already exist :slightly_smiling_face:

1 Like

@jpsamaroo Hi there, I understand but I just wanted to set off on the right path. So what I was looking for was guidance as to what packages might be interesting to look at for the tasks outlined. I do see the point you are making but I thought part of Discourse was to offer guidance to people with concepts not just code. This is the third time I have got a question wrong on Discourse :slight_smile: Ho hum

I think datastreams.jl is the way ahead for me.

You seem to feel discouraged by the response. I’m sorry to hear that.

I think the challenge is that without a simple example it’s hard be sure that we understand your requirements. In particular, performance tuning is very situation specific as it depends on both the algorithms and data structures used and the peculiar features of the Julia language.

Good luck, and thanks for sharing the link for your potential solution!

2 Likes