I started using Julia, because of its speed in an application, where the workload in pseudo-code looks like I added below (no MWE, but hopefully this is diagnostic enough - otherwise I will create a MWE, but it could be that this is an Julia-novice mistake that I wasn’t able to solve through googling):
There are 4000 csv files (4-7mb each), for each, the code does some data prep and then calls a function that iterates through each row and does a bunch of arithmetic. Unfortunately, the arithmetic is time-series data / state dependent and cannot be vectorized.
When single-threading it, the average iteration takes 7 seconds per csv file. When using Threads.@threads and 8 threads for the first loop
Threads.@threads for i in 1:4000, the average iteration takes 7.8 seconds.
I changed the environment variable, so Julia always starts with 8 threads, check for it before running anything and most importantly, the CPU utilization in the multi-threaded case goes up to 100% on all 8 cores of my CPU, but the execution still is slower. Any advice on how to use multi-threading and actually speed up the code execution?
Any help is much appreciated
resArr = DataFrame() for i in 1:4000 df = DataFrame(CSV.File(file_i))) dummy = fun1(df) resArr[i] = dummy end function fun1(df) *someDataPrep* res = fun2(df) return res end function fun2(df) for (i, row) in enumerate(eachrow(df)) *a lot of arithmetic* end return x, y, z end