Update vector without using a counter in a multithreaded loop

I would like to convert the following simplified example:

n=10
x = Vector{Float64}(undef,n)
y = 1:10
Threads.@threads for i=1:length(y)
    x[i] = y[i] * 2.0
end

into:

n=10
x = Vector{Float64}(undef,n)
y = 1:10
Threads.@threads for crt_y in y
    x[??] = crt_y * 2.0
end

How can I traverse and update x without using the i counter in a multithreaded loop?

I’m not 100% sure what you are trying to do, both examples the loop variable goes from 1 to 10 so you can use that for indexing. My best guess is you are asking for something like:

n=10
x = Vector{Float64}(undef,n)
y = 11:20
Threads.@threads for crt_y in y
    x[crt_y - y[1] + 1] = crt_y * 2.0
end

In which case as long as the range of y is 10 sequential numbers you will place the calculated values in the array.

Probably axes, so in the worst case something like (ind, val) in zip(axes(x), y)
(not tested I’m not infront of a pc)

1 Like

Another solution could be:

julia> Threads.@threads for crt_y in y
           index=indexin(crt_y,y)[1]
           x[index] = crt_y * 2.0
           end

But the task sounds a bit weird. Perhaps there is a better solution for your actual problem?
This works only if elements in y are unique!

My usecase is a bit more complex. I have a dataframe large_df. One of its columns contains IDs (let’s say from 1 to 1e6, in a random order), the other columns contain data I need to process. I also have a second dataframe ID_df which contains some of those IDs(not necessarily all of them), but in a totally different order.
The task is to identify each ID from the ID_df in the large_df ID column, and then get data from that specific row and store it somewhere (get first ID from ID_df, check if it exists in large_df ID column, if yes, go to that row and extract data from other columns and put it in a preallocated vector like x)

In order to make this more efficient, I use multithreading, since the search operations are independent.
The code works using a dedicated counter.

I would like to do the modification just for styling: the code looks better with Threads.@threads for col in eachcol(df)...

using Random, DataFrames

n=10
ID_df = DataFrame(ID=shuffle(1:n))
large_df = DataFrame(ID=shuffle(1:n), data=randn(n))

x = Array{Float64,2}(undef,n,2)

Threads.@threads for i=1:size(ID_df)[1]
   # get current ID from ID_df
   crt_ID_row = ID_df[i,:]
   crt_ID = crt_ID_row.ID

   # find ID in large_df and extract data from that row
   df_row = filter([:ID] => x -> x == crt_ID, large_df)
   x[i,1] = crt_ID
   x[i,2] = df_row.data[1]
end
Threads.@threads for crt_ID_row in ID_df
   # get current ID from ID_df   
   crt_ID = crt_ID_row.ID

   # find ID in large_df and extract data from that row
   df_row = filter([:ID] => x -> x == crt_ID, large_df)
   x[i,1] = crt_ID
   x[i,2] = df_row.data[1]
end