Help improving the speed of a DataFrames operation

abelsiqueira · December 18, 2023, 4:50pm

@pdeffebach: As in, these ranges are vcat ed together?

No, its a Vector{UnitRange{Int}}, i.e., tjme blocks are ranges, and the cons_time_blocks and flow_time_blocks variables are vectors of time blocks.

Please produce an MWE so I can help debug the join.

Done. I have separated 4 strategies mentioned before:

“Current best”, which uses Tables.rows and idx = findall
“Also decent”, which uses Tables.rows and the more traditional @view
“Older strategy”, which uses a variation of the early strategy that I mentioned
“Leftjoin strategy”, which tries leftjoin with @chain

The data is in GitHub - abelsiqueira/TulipaEnergyModel.jl, branch mwe-discourse, file mwe.jl. It might be easier to simply clone.
Here are the cloning steps on linux:

cd $(mktemp -d)
git clone https://github.com/abelsiqueira/TulipaEnergyModel.jl .
git checkout mwe-discourse
julia --project
pkg> instantiate
julia> include("mwe.jl")

This will print instruction and the timing on the “Tiny” data:

Current best:
  0.001040 seconds (13.03 k allocations: 1005.875 KiB)
Also decent:
  0.001191 seconds (16.63 k allocations: 1.159 MiB)
Older strategy
  0.003551 seconds (282.17 k allocations: 12.629 MiB)
Leftjoin strategy
  0.023693 seconds (99.80 k allocations: 8.522 MiB, 84.62% gc time)

You change search for input_dir in the file, and comment out the line with the “EU” path. The output for me are:

Current best:
 99.465536 seconds (4.59 M allocations: 2.132 GiB, 0.17% gc time)
Also decent:
100.750630 seconds (5.93 M allocations: 2.181 GiB, 0.44% gc time)
Older strategy
# Gave up after maybe 15 minutes

The leftjoin strategy simply kills my VSCode or my terminal after ~1 minute.

Topic		Replies	Views
DataFrame transformation is so slow, what am I doing wrong? Performance compilation , dataframes	17	328	May 19, 2024
Help with performance tuning this dataframe aggregation Performance	10	738	September 23, 2018
DataFrames operation scales badly Performance	21	2710	December 10, 2018
How to speed up this DataFrame operation Data performance	11	723	March 19, 2021
Need for speed: looping over subdataframes to construct lags Performance question , dataframes	6	363	March 18, 2023

Help improving the speed of a DataFrames operation

Related topics