Best way to Synchronize Threads/Lock variables in a nontrivial computation

Hi everyone.

I have several questions regarding Parallel/Concurrent Programming with Julia.

I have an algorithm which in simple terms does the following:

function main(iters)
  bestVal = 1e9
  fields = generate_fields()
  bestStruct = CustomStruct(fields...)
  for i in 1:iters
    first_part = first_function()
    second_part = second_function(first_part)
    if second_part.value < bestVal
      bestVal = second_part.value
      bestStruct = second_part
    end #if
  end # for
return bestStruct, bestVal
end

main(100)

Where the inner functions are time consuming and perform most of the calculations in a single threaded manner. I have the belief that using Parallel programming will help me speed up the execution of the entire main function.

I looked into the docs and the manual, reading the chapters of Multi-Threading and Parallel Computing, however I still have some doubts. In very simple terms, I would like to parallelize the main for loop in order to have each Thread (or Task) perform an iteration of the loop with both functions, then compare the result with my best incumbent value and then update the value if it’s required.

First and foremost, how can I make my Threads each start an iteration of my main loop, perform the work on the first and later second function properly? I know that by using Threads.@threads before the for the whole for gets parallelized, however I found that most examples online store the results of the computations in a previously defined buffer/array. In my case, I would like NOT to store the results as I fear for OOM errors.

I understand that I must use a lock to avoid race conditions but here is where it gets fuzzy to me: do I need to lock the first_part and second_part variables? It is my understanding that all the threads share the same memory so each thread would overwrite the variables for other threads as they are dynamically scheduled.

But if I lock them, how would I properly store the value? Is it better to just compare the incumbent value as each thread finishes? Do I need to use two arrays: the first one to store the results of first_function and then later use the array contents in second_function? What if the logic gets more complicated and I add more functions, does that approach scale well or will I run into OOM issues?

I read about tasks and spawning and fetching, however I don’t think I can make my functions Tasks as they require arguments and have a specific order of operations.

Sorry if my question is too broad or not specific enough, I will be more than happy to provide additional context and the functional real code that I am trying to parallelize. I really love the Julia language but I find examples of multithreading and parallel programming lacking, as most just simply point to FLoops or Pmap or using simple lock examples.

Thank you very much!

If the first and second functions do not mutate shared buffers, then the only part that needs a lock is the if ... end block.

1 Like

Didn’t think it would be that straightforward, thank you very much!

Imho, programs written in functional style are often easier to parallelize and your loop is a map-reduce, i.e., Transducers.jl seems to be a perfect fit:

# Note: Julia started with 8 threads
julia> using Transducers

julia> process(i) = begin sleep(i/100); 2*i end
process (generic function with 1 method)

# Serial summation
julia> @time foldl(+, Map(process), 1:10)
  0.567136 seconds (51 allocations: 1.703 KiB)
110

# Parallel summation
julia> @time foldxt(+, Map(process), 1:10)
  0.102100 seconds (112 allocations: 6.156 KiB)
110

# Your example
julia> keepbetter(currentbest, candidate) = if candidate.value > currentbest.value candidate else currentbest end
keepbetter (generic function with 1 method)

julia> first_fun() = (; value = rand())
firststep (generic function with 1 method)

julia> second_fun(x) = x  # just an example
secondstep (generic function with 1 method)

julia> foldxt(keepbetter, Map(i -> second_fun(first_fun())), 1:10)
(value = 0.973360931593118,)

Note also that your first_function does not take an argument, i.e., in order to return different results it will need to depend on some hidden state and is therefore prone to further race conditions if that state is not thread-safe.

1 Like

A point of style:

One-liner if, for, and while statements can have parsing surprises. The parsing surprises I’ve encountered thus far are:

A) Parentheses and brackets

julia> if true (1) end
ERROR: syntax: space before "(" not allowed in "true (" at REPL[1]:1

julia> if true [1] end
ERROR: syntax: space before "[" not allowed in "true [" at REPL[2]:1

B) Symbols and quoted expressions

julia> if true :hi end
ERROR: UndefVarError: `hi` not defined

julia> if true :(x+y) end
ERROR: UndefVarError: `x` not defined

C) $ Interpolation

julia> :( if true $x[] = y end )
ERROR: syntax: unexpected "="

julia> @btime if true $x[] = y end
ERROR: syntax: unexpected "="

As a result, for one-liners I’ve adopted a style of adding a strategic semicolon ; after the condition in order to avoid surprises.