Global variables "leak" during mulithreading

Using global variables inside a function (executed separately) with multithreading seems to cause multiple threads to incorrect share/access the global variables.

It’s a bit contrived for me to present a concrete examples but my code looks something like this:

@threads for (i, params) in collect(enumerate(my_list))

    my_function(i, params)

end

Inside my_function I use a global variable. The program is crashing with n>=2 threads because somehow one thread is making reference to the global variable value of another thread. The error message I got was basically: (in thread A) dictionary doesn’t have access to key X where X would be a value in thread B. Logically this shouldn’t be happening since the two threads are separated by different calls to my_function.

The reason I was using a global variable inside my_function was I had a while-loop where I was updating the variable every iteration. After modifying my implementation to use a growing array instead of a global variable, my program no longer crashed.

This would be strange if it’s expected behavior. If it isn’t, it seems like a serious bug with multithreading and global variables.

If it’s global - there’s exactly one copy that all threads can access.

threads is shared-memory parallelism, so if the global variable is mutable and used in a non-thread-safe way, this is expected to happen.

2 Likes

I see. Sorry my mistake. Is there a good way to update variables in for-loops and while-loops other than using global variables then?

There seems to be a beginner trap here. Trying to update variables in loops like in Python → Google solution → find solution with global variable → use global variable → switch to multithreading and scratch your head (like I did) for 5h trying to debug.

Do you need a per-thread accumulator? Or do you just need a thread-safe data structure that all threads can use?

TLDR is don’t use global variables. put the loop in a function and have a variable in that function.

1 Like

Each thread is running completely independently.

@threads is arbitrarily splitting the work to different threads, so I’m trying to determine what the point is of a thread-specific global?

I think I misused global, I was just trying to update a variable in a loop (as I’m sure many new users to Julia would try to do).

I understand, but let’s say you’re doing this:

i = 1
for x in 1:100
    i += 1
end

And you want to parallelize this. Why do you want a copy of i for each thread? Are you bringing them back together at the end somehow?

Correct me if I’m wrong, but it seems like your mental model of what’s happening is that you’re running 8 separate copies of for loops (if you have 8 threads available), instead of one for loop where sets of 8 loop bodies are arbitrarily run in parallel?

So I think you really want a variable in the next-higher scope that can be safely updated by each thread. And that could be passed to my_function as an argument (probably renamed my_function! to indicate that it’ll modify its argument).