For transform(df, cols => ByRow(f))
I can parallelize by replacing ByRow with a parallel map
. Is it possible to parallelize combine(groupby(df, cols), ...)
?
1 Like
There are the following things that could be parallelized in your question:
-
groupby
: it is parallelized - parallelizing multiple operations in
...
- this is already done by default (you can disable it if you want) - parallelize a single operation that is a custom function that produces one row per group - this is already done by default (you can disable it if you want)
- parallelize a single operation that is a custom function that produces many rows per group - currently parallelizing it is not supported (the reason is that composing an output of such an operation is hard to parallelize)
- parallelize a single operation that is a standard function that is optimized (like
mean
,sum
) - this is currently not supported, we might add it, but we have not done so yet, because the custom aggregations we now have that use a single thread are fast and it was hard to find a good threshold when enabling multithreading gave benefits (for sure for tables having less than 1000000 rows the cost of spawning tasks was bigger than the benefit)
1 Like