This query was originally going to be me complaining about why I was getting inconsistent performance over some benchmark times, it turns out that it was at some kind of cache threshold and modifying the size of the vector fixed the issue.
My original code was in R and I decided to speed things up by implementing it in C++ using Rcpp:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <functional>
#include <cmath>
#include <Rcpp.h>
using namespace Rcpp;
double bananaObjective(NumericVector x)
{
return 100*std::pow(x[1] - std::pow(x[0], 2.0), 2.0) + std::pow(1 - x[0], 2.0);
}
//[[Rcpp::export]]
NumericVector bananaObjective(NumericMatrix x)
{
int n = x.nrow();
NumericVector output(x.nrow());
for(int i = 0; i < n; ++i)
{
NumericVector tmp = x(i, _);
output[i] = bananaObjective(tmp);
}
return output;
}
//[[Rcpp::export]]
NumericVector benchmarkBanana(int len, int n)
{
NumericVector output(len);
for(int i = 0; i < len; ++i)
{
//arma::mat tmp(n, 2, arma::fill::randu);
NumericVector v = runif(n*2);
v.attr("dim") = Dimension(n, 2);
NumericMatrix tmp = as<NumericMatrix>(v);
arma::wall_clock timer;
timer.tic();
auto x = bananaObjective(tmp);
output[i] = timer.toc();
}
return output;
}
Okay before I carry on, there is this horrible myth that you can get okay performance from R if you pre-cache a vector even in a loop. The performance in an R loop is always bad but sometimes it can be less bad if you pre-cache. Anyway we can get a x10 speedup just by moving the functions to C++, here is the equivalent R code and benchmark for comparison:
.bananaObjective = function(x)
100*(x[2] - (x[1])^2)^2 + (1 - x[1])^2
RBananaObjective = function(x)
{
n = nrow(x)
output = vector(mode = "numeric", length = n)
for(i in 1:n)
output[i] = .bananaObjective(x[i, ])
return(output)
}
require(Rcpp)
sourceCpp("test.cpp")
n = 5E5; x = matrix(runif(n*2), ncol = 2)
system.time(RBananaObjective(x))
# user system elapsed
# 0.540 0.000 0.541
system.time(bananaObjective(x))
# user system elapsed
# 0.047 0.000 0.047
Okay great, now we can run the benchmark function in C++ 100 times maybe the returning to R each time has an overhead - apparently it doesnāt:
x = benchmarkBanana(100, 500000)
mean(x)
# [1] 0.04853786
So what about Julia?
function bananaObjective(x::Array{T, 1}) where {T <: AbstractFloat}
return 100*(x[2] - (x[1])^2)^2 + (1 - x[1])^2
end
function bananaObjective(x::Array{T, 2}) where {T <: AbstractFloat}
n = size(x)[1]
output = zeros(T, n)
for i in 1:n
output[i] = bananaObjective(x[i, :])
end
return output
end
# bench(Float64, 100, 1000_000)
function bench(T::Type{<: AbstractFloat}, len::I, n::I) where {I <: Integer}
times = zeros(T, len)
for i in 1:len
x = rand(T, n, 2)
t1 = time(); bananaObjective(x); t2 = time()
times[i] = t2 - t1
end
return times
end
import Statistics: mean
x = bench(Float64, 100, 500_000)
mean(x)
# 0.019942469596862793
Thatās more than x2 the speed of the code implemented with Rcpp. Whatās more this code is much easier to write than the C++ code and far more flexible and generic.