Hello All
I started using Julia, because of its speed in an application, where the workload in pseudo-code looks like I added below (no MWE, but hopefully this is diagnostic enough - otherwise I will create a MWE, but it could be that this is an Julia-novice mistake that I wasn’t able to solve through googling):
There are 4000 csv files (4-7mb each), for each, the code does some data prep and then calls a function that iterates through each row and does a bunch of arithmetic. Unfortunately, the arithmetic is time-series data / state dependent and cannot be vectorized.
When single-threading it, the average iteration takes 7 seconds per csv file. When using Threads.@threads and 8 threads for the first loop Threads.@threads for i in 1:4000
, the average iteration takes 7.8 seconds.
I changed the environment variable, so Julia always starts with 8 threads, check for it before running anything and most importantly, the CPU utilization in the multi-threaded case goes up to 100% on all 8 cores of my CPU, but the execution still is slower. Any advice on how to use multi-threading and actually speed up the code execution?
Any help is much appreciated
resArr = DataFrame()
for i in 1:4000
df = DataFrame(CSV.File(file_i)))
dummy = fun1(df)
resArr[i] = dummy
end
function fun1(df)
*someDataPrep*
res = fun2(df)
return res
end
function fun2(df)
for (i, row) in enumerate(eachrow(df))
*a lot of arithmetic*
end
return x, y, z
end