# Multi-threading

#1

I’m using windows 10 and julia 1.0.3 and testing out the multi-threaded function.

I understand that when generating random numbers we would need a separate seed for each thread but ignoring that for the purpose of a very preliminary test, I created the following code. In the following im trying to assess the overheads of calling a multi-threaded for loop multiple times.

#set JULIA_NUM_THREADS=4

test = zeros(10000,1000);

@time for i in 1:10000
Threads.@threads for j in 1:1000
test[i,j] = rand(1)[1]
end
end


When I ran the above code, I was returned a continuous stream of error messages, their counts likely correlated to the number of times the threaded loop was called. Sample of the errors are:

Error thrown in threaded loop on thread 0: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 2: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 1: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 0: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 3: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 1: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 2: InexactError(func=:check_top_bit, T=Int64, val=-8)
Error thrown in threaded loop on thread 3: InexactError(func=:check_top_bit, T=Int64, val=-8)

True enough, when i did a sum to check the results, sum(sum(test,dims=1),dims=2), my output was about 40081, way lesser than the expected value of 0.5 \times 10000 \times 1000.

Wondering what mistake I may have made…

Edit1: To think of it I suspect it’s sychronization issues, some threads may have finished and moved on to the next loop?

Generating matrix random variables in Multi-Threaded mode with Distributions
#2

That’s precisely the problem. It’s not just that you need separate seeds — it’s that you need wholly distinct random number generators! Remember that calling rand doesn’t just return some value, it’s also mutating the state of your generator to move to the next value. When multiple threads try to mutate this state at the same time, you can rapidly run into inconsistencies.

5 Likes
#3

You can’t. Calling the same random number generator from multiple thread is undefined behavior and anything can happen.

Practically, this at least mean that you can corrupt/have corrupted the random number generate (giving you the error) and you’ll not get the correct performance measure. You’ll get very bad performance even without any error.

3 Likes
#4

Ah gosh, I see, this makes sense now. Thanks @mbauman and @yuyichao

#5

I’d take a look at trandjump of you’d like solutions.

1 Like
#6

Does this also apply to distributed computing across cores? If let’s say instead of Threads.@threads, I use @distributed?

#7

No

1 Like
#8

Byt the way, and incidentally, when people write rand(1)[1], what they almost always want is just rand().

rand(1)[1] allocates a length-1 vector and then reads out the first number, while rand() just returns a random value.

2 Likes
#9

Hi @DNF , @mbauman ,@Elrod,

is Threads.@threads equal to the command parfor in MATLAB?

#10

No afaict MATLAB do not have user controlled thread.

#11

Good question.
@yuyichao, while there maybe no race conditions issue with generating random numbers with @distributed, I guess its safe to say that the random numbers also dont run the risk of overlaps since no race condition issues should mean that there is some synchronization involved between workers?

#12

I sometimes toy with the idea of calling it rand!. Strictly speaking, rand is a lie.

2 Likes
#13

# parfor

Execute for -loop iterations in parallel on workers

in Julia Threads.@threads + for == parfor in Matlab ???

#14

This works:

test = zeros(10000,1000);

using Random

const generators = Dict{Int,Any}()
const mutex = Threads.Mutex()

function randomNumber()
i = Threads.threadid()
if !haskey(generators,i)
lock(mutex)
println("Creating random number generator for thread $i") generators[i] = MersenneTwister(rand(RandomDevice(),UInt8)) unlock(mutex) end return rand(generators[i]) end @time for i in 1:10000 Threads.@threads for j in 1:1000 test[i,j] = randomNumber() end end println(sum(test)/length(test))  prints Creating random number generator for thread 1 Creating random number generator for thread 4 Creating random number generator for thread 2 Creating random number generator for thread 3 1.071951 seconds (7.12 M allocations: 132.371 MiB, 17.82% gc time) 0.49999995453686974  However I don’t know if it can be improved by using a strongly typed dictionary instead of Dict{Int,Any} 1 Like #15 No. It is kind of like @parallel with some automatic data transfers. It requires separate processes, so much that using parfor on a cluster requires a new license for each core! That is very different than threading. 2 Likes #16 Workers and threads are not the same. You can google “threads vs processes” or something to get an explanation. Workers are the same as processes. #17 No, parfor spawns multiple processes. #18 Wouldn’t it be using Distributed; @distributed for these days? #19 A few changes, namely concretely typing the generators dictionary, hoisting generator initialization out of the loop, and putting stuff into a function to avoid the global scope: using Random using BenchmarkTools const generators = Dict{Int,MersenneTwister}() function dothread(gen) for i in 1:Threads.nthreads() if !haskey(generators,i) println("Creating random number generator for thread$i")
gen[i] = MersenneTwister(rand(RandomDevice(),UInt8))
end
end
test = zeros(10000,1000)
Threads.@threads for j in 1:1000
tidx = Threads.threadid()
for i in 1:10000
test[i,j] = rand(gen[tidx])
end
end
return test
end

@btime dothread(\$generators)

test = dothread(generators)
println(sum(test)/length(test))

JULIA_NUM_THREADS=4 julia threaded-rand.jl
Creating random number generator for thread 1
Creating random number generator for thread 2
Creating random number generator for thread 3
Creating random number generator for thread 4
62.128 ms (32 allocations: 76.30 MiB)
0.49998218701926284