Update vector without using a counter in a multithreaded loop

Iulian.Cioarca · May 25, 2021, 12:54pm

I would like to convert the following simplified example:

n=10
x = Vector{Float64}(undef,n)
y = 1:10
Threads.@threads for i=1:length(y)
    x[i] = y[i] * 2.0
end

into:

n=10
x = Vector{Float64}(undef,n)
y = 1:10
Threads.@threads for crt_y in y
    x[??] = crt_y * 2.0
end

How can I traverse and update x without using the i counter in a multithreaded loop?

pixel27 · May 25, 2021, 1:21pm

I’m not 100% sure what you are trying to do, both examples the loop variable goes from 1 to 10 so you can use that for indexing. My best guess is you are asking for something like:

n=10
x = Vector{Float64}(undef,n)
y = 11:20
Threads.@threads for crt_y in y
    x[crt_y - y[1] + 1] = crt_y * 2.0
end

In which case as long as the range of y is 10 sequential numbers you will place the calculated values in the array.

FPGro · May 25, 2021, 1:21pm

Probably axes, so in the worst case something like (ind, val) in zip(axes(x), y)
(not tested I’m not infront of a pc)

oheil · May 25, 2021, 1:52pm

Another solution could be:

julia> Threads.@threads for crt_y in y
           index=indexin(crt_y,y)[1]
           x[index] = crt_y * 2.0
           end

But the task sounds a bit weird. Perhaps there is a better solution for your actual problem?
This works only if elements in y are unique!

Iulian.Cioarca · May 25, 2021, 2:28pm

My usecase is a bit more complex. I have a dataframe large_df. One of its columns contains IDs (let’s say from 1 to 1e6, in a random order), the other columns contain data I need to process. I also have a second dataframe ID_df which contains some of those IDs(not necessarily all of them), but in a totally different order.
The task is to identify each ID from the ID_df in the large_df ID column, and then get data from that specific row and store it somewhere (get first ID from ID_df, check if it exists in large_df ID column, if yes, go to that row and extract data from other columns and put it in a preallocated vector like x)

In order to make this more efficient, I use multithreading, since the search operations are independent.
The code works using a dedicated counter.

I would like to do the modification just for styling: the code looks better with Threads.@threads for col in eachcol(df)...

using Random, DataFrames

n=10
ID_df = DataFrame(ID=shuffle(1:n))
large_df = DataFrame(ID=shuffle(1:n), data=randn(n))

x = Array{Float64,2}(undef,n,2)

Threads.@threads for i=1:size(ID_df)[1]
   # get current ID from ID_df
   crt_ID_row = ID_df[i,:]
   crt_ID = crt_ID_row.ID

   # find ID in large_df and extract data from that row
   df_row = filter([:ID] => x -> x == crt_ID, large_df)
   x[i,1] = crt_ID
   x[i,2] = df_row.data[1]
end

Threads.@threads for crt_ID_row in ID_df
   # get current ID from ID_df   
   crt_ID = crt_ID_row.ID

   # find ID in large_df and extract data from that row
   df_row = filter([:ID] => x -> x == crt_ID, large_df)
   x[i,1] = crt_ID
   x[i,2] = df_row.data[1]
end

Topic		Replies	Views
Multi-threading with DataFrames General Usage multithreading , dataframes	33	2768	December 24, 2023
Multithreading in Julia General Usage multithreading	3	1773	May 8, 2017
Mutlithread functions within for loop General Usage multithreading	3	296	January 9, 2024
Thread-safe array building General Usage multithreading	21	7519	October 24, 2017
Scoping in loops and multi-threading New to Julia multithreading	1	454	November 17, 2021

Update vector without using a counter in a multithreaded loop

Related topics