I designed and implemented a method where the core task is to evaluate some 2D integrals using Gauss-Legendre methods in Julia. For practical reasons, the final code has to be in Java; the translation into Java is about 2.5-3 times faster without any effort at optimization.
I spent some time examining the Julia implementation to find bottlenecks, and the most obvious sign of poor design/performance is that benchmarking shows large number of allocations and a corresponding large memory usage (about 1GB compared to a data array that is only 61 by 61, with Float64 elements).
I’ve isolated a couple of the core functions to create an MWE; the times and memory usage are very small, but the functions are called many times in the real application and I think what is happening is that every time I call these functions there are repeated allocations.
For this reason, I’m looking for advice on how to address this. Any tips would be greatly appreciated. The key point is the that integrand function only has 6 allocations, but the integrator incurs 40 - I’d like to reduce that number.
In the code examples below, I have a function integrator that calls a function f```` that evaluates the integrand.  To simplify setting variables, I also have a file setVals.jl``` that allows setting the parameters to benchmark the functions.   So, my steps are:
include("setVals.jl") to set variables to benchmark functions. Then, I
@benchmark  f($Mmatrix,$x,$yVec,$c,$d,$wInt)
BenchmarkTools.Trial: 
  memory estimate:  832 bytes
  allocs estimate:  40
  --------------
  minimum time:     765.275 ns (0.00% GC)
  median time:      802.927 ns (0.00% GC)
  mean time:        942.206 ns (10.27% GC)
  maximum time:     435.468 ÎĽs (99.77% GC)
  --------------
  samples:          10000
  evals/sample:     109
@benchmark integrator($a,$b,$y1,$y2,$x,$Mmatrix,$c,$d)
BenchmarkTools.Trial: 
  memory estimate:  5.77 KiB
  allocs estimate:  243
  --------------
  minimum time:     5.366 ÎĽs (0.00% GC)
  median time:      5.651 ÎĽs (0.00% GC)
  mean time:        7.423 ÎĽs (19.29% GC)
  maximum time:     8.354 ms (99.86% GC)
  --------------
  samples:          10000
  evals/sample:     6
Here are the contents of the file setting the two functions:
 StaticArrays
function integrator(a::Float64,b::Float64,y1::Float64,y2::Float64,x::Array{Float64,1},M::SMatrix{3,3,Float64},
                    c::Float64,d::Float64)
    limitDiff = (b-a)/2.0
    limitAvg = (b+a)/2.0
    
    integrand = map(theZ->f(M,x,[y1,y2,limitDiff*theZ+limitAvg],c,d,wInt), zInt)
    limitDiff*(wInt'*integrand)
end	 
function f(M::SMatrix{3,3,Float64},x::Array{Float64,1},y::Array{Float64,1},c::Float64,d::Float64,wInt::SVector{5,Float64})
    if ((x[1]-y[1])^2 + (x[2]-y[2])^2)<Base.eps(x[1])
        θ=0.0
    else
        θ=atan(((x[1]-y[1])^2 + (x[2]-y[2])^2)^0.5,  (x[3]-y[3]))
    end
    if abs( x[2]-y[2] )<Base.eps(x[1])
        Ď•=0.0
    else
        Ď• = atan(x[2]-y[2], x[1]-y[1])
    end
    γ = @SVector [sin(θ)cos(ϕ), sin(θ)sin(ϕ) , cos(θ)]
    theVal = (  (-2 + 15Îł[1]^2 - 15Îł[1]^4 + (3-4d)*(1-3Îł[1]^2))*M[1,1]
                + (6Îł[1]Îł[2] - 3(3-4b)Îł[1]Îł[2] - 15Îł[1]^3*Îł[2])*M[1,2]
                + (6Îł[1]Îł[3] - 3(3-4b)Îł[1]Îł[3] - 15Îł[1]^3*Îł[3])*M[1,3]
                + (9Îł[1]Îł[2] - 15Îł[1]^3*Îł[2])*M[2,1]
                + (-1 + 3Îł[1]^2 + 3Îł[2]^2 - 15Îł[1]^2*Îł[2]^2)*M[2,2]
                + (3Îł[2]Îł[3] - 15Îł[1]^2*Îł[2]*Îł[3])*M[2,3]
                + (9Îł[1]Îł[3] - 15Îł[1]^3*Îł[3])*M[3,1]
    + (3Îł[2]Îł[3] - 15Îł[1]^2*Îł[2]Îł[3])*M[3,2]
    + (-1 + 3Îł[1]^2 + 3Îł[3]^2 - 15Îł[1]^2*Îł[3]^2)*M[3,3] )
    # avoid singularity where output point is on fracture (or too close).
    rCubed = r(x,y)^3
    if( rCubed < 10^-5 )
        rCubed=10^-5
    end
    return(theVal/(16*Ď€*c*(1-d)*rCubed))
end
function r(x::Array{Float64,1}, Îľ::Array{Float64,1})
    ((x-Îľ)'*(x-Îľ))^0.5
end
Here are the contents of the setVals.jl file:
using StaticArrays
using FastGaussQuadrature
a = -15.0
b = -14.0
x=[1.,1.,0.]
Mmatrix = @SMatrix [10.0e9 0 0; 0 0 0; 0 0 0]
zInt0,wInt0=gausslegendre(5)
zInt=@SVector [zInt0[i] for i in 1:5]
wInt=@SVector [wInt0[i] for i in 1:5]
c=1.3e9
d=0.25
y1=-15.0
y2=-2.
yVec=[y1; y2; -15.0]


 - this is exactly what I was trying to figure out, as I expected that the array in the map() function was creating issues, but I couldn’t figure out how to achieve what you demonstrated.
- this is exactly what I was trying to figure out, as I expected that the array in the map() function was creating issues, but I couldn’t figure out how to achieve what you demonstrated. ).  I’d also never name a function just f either for the same reason, in “real code”.
).  I’d also never name a function just f either for the same reason, in “real code”.