Understanding the performance costs of generating functions on the fly

pdeffebach · October 8, 2020, 6:20pm

I just ran a similar benchmark, and it confirms the logic.

julia> df = DataFrame(a = rand(1_000_000), b = rand(1_000_000));

julia> a = df.a; b = df.b;

julia> function comp(a, b)
       ma = mean(a)
       mb = mean(b)
       
       c = cor(a, b)
       
       z = (a .- mb ./ std(a))
       
       return b .+ mb .- z .* c
       end;

julia> function make_fun_2(df)
       pn = propertynames(df)
       if :a in pn && :b in pn 
           [:a, :b] => function(a, b)       
              ma = mean(a)
              mb = mean(b)
              
              c = cor(a, b)
              
              z = (a .- mb ./ std(a))
              
              return b .+ mb .- z .* c => :c
              end => :c
       else
           [:a, :b] => ((a, b) -> fill("hello", length(a))) => :c
       end
       end;

julia> @btime comp(a, b);
  5.794 ms (4 allocations: 15.26 MiB)

julia> @btime transform!(df, [:a, :b] => comp => :c);
  6.067 ms (88 allocations: 15.26 MiB)  # within a margin of error from the above

julia> @btime transform!(df, make_fun_2(df));
  8.109 ms (97 allocations: 30.52 MiB)

Topic		Replies	Views
Fastest way to create new column in DataFrames.jl New to Julia	0	1590	September 2, 2020
Help with performance tuning this dataframe aggregation Performance	10	738	September 23, 2018
Understanding DataFrame allocations Performance dataframes	1	65	November 18, 2024
Compiler optimizations around DataFrames Performance	1	103	August 6, 2024
On the performance of function calls that depends on a variable Performance metaprogramming	12	947	February 24, 2021

Understanding the performance costs of generating functions on the fly

Related topics