Running Julia with native Multithreading vs in Seperate Processes

I’ve measured the performance of a Script which:

  • Reads from a file of some binary format
  • Does a binary search of what it read within a big sorted list containing every value
  • and then fills in a Vector{UInt8} that was already preallocated to be the right length needed.

Im sorry to say that I can’t really share more implementation details than this, so I’ll be fine knowing if this is a general Julia problem or if it might lie somewhere within my Code.

We had ~100 Files which names were numbered and that needed to be converted like this, so I thought I’d use Julias own Multithreading to parallelize that loop, which looked roughly like this:

function main()
    Threads.@threads for i in 1:100
        convertFile("/path/to/input/" * string(i) * ".input", "/path/to/output/" * string(i) * ".output")


I then went on to measure how long each part of the Script took to execute:

The Performance of Thread1 for one File, using Julias Threads.@threads macro with option -t 8:

Time for:
    Reading:          4818.024800 seconds
    Binary Search:  430.770371 seconds
    Write, part A: 2568.713015 seconds
    Write, part B:  17747.132355 seconds
    Total Time Until now: 25680.028282 seconds

Once I noticed that these performance numbers didn’t line up with what I measured in Single-Thread Performance at all, I wanted to test running it in 8 Seperate Processes that were completely independent from one another.

The Performance of one Thread for one File, when started in 8 Seperate Processes (commented out the Threads.@threads macro, called julia /path/to/script.jl multiple times in multiple terminals, changing the range in the file before each call):

Time for:
    Reading:          2062.276720 seconds
    Binary Search:   91.480939 seconds
    Write, part A: 605.754762 seconds
    Write, part B:  6502.055646 seconds
    Total Time Until now: 9381.597169 seconds

There are no global variables modified from multiple threads, however I read from a few const global variables. These Threads don’t need to know anything about each other, they were basically just working through their own files. Performance for all Cores during both Julias native Multithreading and starting them in seperate Instances were at 100%.

I believe I still have quite a bit of allocations within my code, though im unsure how big of an impact this has when using Multithreading compared to running it in Single Thread. Any insight as to why this gap in Performance between running it in a Process with multiple Threads vs in Multiple Processes using 1 Thread each exists in every part of the code is appreciated.

Julia’s I/O and GC are not “the best in class” right now, so it seems to be observed by many people at the moment that if you have a I/O or GC-intense work, multi-process would be better because you get parallel GC by construction. Since you have no need to share data (memory) between difference tasks, you should use whatever is faster, in this case multi-processing