I can’t get the setup
argument of @benchmark
to work correctly with DataFrames. I’ll use the example in the docs of DataFrames.leftjoin!
as an MWE:
julia> using DataFrames, BenchmarkTools
julia> name = DataFrame(ID=[1, 2, 3], Name=["John Doe", "Jane Doe", "Joe Blogs"])
3×2 DataFrame
Row │ ID Name
│ Int64 String
─────┼──────────────────
1 │ 1 John Doe
2 │ 2 Jane Doe
3 │ 3 Joe Blogs
julia> job = DataFrame(ID=[1, 2, 4], Job=["Lawyer", "Doctor", "Farmer"])
3×2 DataFrame
Row │ ID Job
│ Int64 String
─────┼───────────────
1 │ 1 Lawyer
2 │ 2 Doctor
3 │ 4 Farmer
julia> leftjoin(name, job, on = :ID)
3×3 DataFrame
Row │ ID Name Job
│ Int64 String String?
─────┼───────────────────────────
1 │ 1 John Doe Lawyer
2 │ 2 Jane Doe Doctor
3 │ 3 Joe Blogs missing
julia> @benchmark leftjoin($name, $job, on = :ID)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 10.000 μs … 6.566 ms ┊ GC (min … max): 0.00% … 98.21%
Time (median): 12.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 14.525 μs ± 113.040 μs ┊ GC (mean ± σ): 13.30% ± 1.71%
▃ █ ▇ ▅ ▃
▁▁▂▂▆▄█▇█████▇█▅▅█▅▇▄▇▄▆▃▅▃▅▃▅▂▂▃▂▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
10 μs Histogram: frequency by time 18.8 μs <
Memory estimate: 13.00 KiB, allocs estimate: 201.
So far so good. But now let’s try leftjoin!
instead. Since it modifies its first DataFrame argument, I’ll use the setup
argument to reinitialize that argument every sample:
julia> @benchmark leftjoin!(newname, $job, on = :ID) setup=(newname=copy($name))
ERROR: ArgumentError: the following columns are present in both left and right data frames but not listed in `on`: Job. Pass makeunique=true to add a suffix automatically to columns names from the right data frame.
Stacktrace:
[1] leftjoin!(df1::DataFrame, df2::DataFrame; on::Symbol, makeunique::Bool, source::Nothing, matchmissing::Symbol)
@ DataFrames C:\Users\niclas\.julia\packages\DataFrames\LteEl\src\join\inplace.jl:118
[2] var"##core#470"(job#469::DataFrame, newname::DataFrame)
@ Main C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:489
[3] var"##sample#471"(::Tuple{DataFrame}, __params::BenchmarkTools.Parameters)
@ Main C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:497
[4] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:161
[5] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
@ BenchmarkTools C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:152
[6] #invokelatest#2
@ .\essentials.jl:729 [inlined]
[7] invokelatest
@ .\essentials.jl:726 [inlined]
[8] #lineartrial#46
@ C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:35 [inlined]
[9] lineartrial
@ C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:35 [inlined]
[10] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:251
[11] tune! (repeats 2 times)
@ C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:247 [inlined]
[12] top-level scope
@ C:\Users\niclas\.julia\packages\BenchmarkTools\0owsb\src\execution.jl:394
It seems the newname
DataFrame has the :Job column even though it gets reinitialized to copy(name)
(which lacks the :Job column) every benchmark sample. What’s going on here? Is this a bug or am I using it wrong?