Help to reduce memory allocations in a function

Hello Julia experts,

I have been working on the function shortest_distance2_t for quite a while, trying to optimise its performance. I do not have very advanced perf tools. Currently I use BenchmarkTools and PProf. I realize that there are too many memory allocations in this function. But I just don’t know how to reduce them.

The function and related definitions are listed below.

import GeometricalPredicates: AbstractPolygon2D, AbstractPoint2D, AbstractLine2D, Line2D, Point2D, getx, gety, geta, getb

struct ExtLine2D{T<:AbstractPoint2D} <: AbstractLine2D

function ExtLine2D(a::T, b::T) where {T<:AbstractPoint2D}
    bx = getx(b) - getx(a)
    by = gety(b) - gety(a)
    ExtLine2D(a, b, bx, by, bx*bx+by*by)

geta(l::ExtLine2D) = l._a
getb(l::ExtLine2D) = l._b
length2(l::ExtLine2D) = l._l2

unpack(l::ExtLine2D) = (l._a._x, l._a._y, l._b._x, l._b._y, l._bx, l._by, l._l2)

struct PolygonInRegion2D{T<:AbstractPoint2D} <: AbstractPolygon2D
    function PolygonInRegion2D{T}(p::T...) where {T<:AbstractPoint2D}
        l = Vector{ExtLine2D}(undef, length(p) - 1)
        for i in range(1, stop=length(p) - 1)
            l[i] = ExtLine2D(p[i], p[i+1])
        push!(l, ExtLine2D(p[end], p[1]))
        new([p...;], l, [false for i = 1:length(l)])  # all edges are not inner edge

getpoints(polygon::PolygonInRegion2D) = polygon._p
getlines(polygon::PolygonInRegion2D) = polygon._l
getlabels(plg::PolygonInRegion2D) = plg._i

function shortest_distance2_t(px::Float64, py::Float64, plg::PolygonInRegion2D)::Tuple{Float64,Float64,Int}
    d2min  = 1e20
    d2     = 0.
    tmin   = 0.0
    imin   = 0
    dx_10  = 0.
    dy_10  = 0.
    proj   = 0.
    t      = 0.
    i      = 0
    for (ll,ii) in zip(plg._l, plg._i)
        i += 1
        if !ii  # not inner edge
            #(d2,t) = _shortest_distance2_t(px, py, plg, i)
            dx_10 = px - ll._a._x
            dy_10 = py - ll._a._y
            proj  = ll._bx * dx_10 + ll._by * dy_10
            t     = proj / ll._l2
            if t > 1
                d2 = (ll._b._x-px)^2 + (ll._b._y-py)^2
                t  = 1.
                d2 = dx_10 * dx_10 + dy_10 * dy_10
                if t < 0
                    t = 0.
                    d2 = max(0, d2 - t * proj)
            if d2 < d2min
                d2min = d2
                tmin  = t
                imin  = i
    if d2min > 9.9e19
        return (1e20, 0.0, -1)
        return (d2min, tmin, imin)

Below is an example:

plg = PolygonInRegion2D{Point2D}(Point2D(0.,0.), Point2D(1.,0.1), Point2D(2.,0.2), Point2D(2.,1.), Point2D(1.,1.1), Point2D(0.,1.2))

using BenchmarkTools

@benchmark shortest_distance2_t(3.3, 0.3, plg)

On my laptop, the benchmark result is

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.617 ΞΌs … 482.226 ΞΌs  β”Š GC (min … max): 0.00% … 98.34%
 Time  (median):     1.818 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   2.132 ΞΌs Β±   6.826 ΞΌs  β”Š GC (mean Β± Οƒ):  5.70% Β±  1.96%

  β–ƒβ–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–ƒβ–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–„β–ƒβ–„β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  1.62 ΞΌs         Histogram: frequency by time        3.92 ΞΌs <

 Memory estimate: 2.34 KiB, allocs estimate: 132.

Not a Julia expert here :sweat_smile: but there is a type instability in your PolygonInRegion2D struct. The vector _p is parametrized, so PolygonInRegion2D{Point2D} will have a vector of the concrete point type in _p, but the vector of lines _l will still be a vector of the abstract type ExtLine2D.

Using _l::Vector{ExtLine2D{T}} in the struct definition (note the additional T) will get rid of the allocations.

Two more things:

  • In general, a good first approach to spot type instabilities (which are often the cause for unnecessary allocations) is to run @code_warntype shortest_distance2_t(3.3, 0.3, plg). This helps find the types that could not be inferred.
  • @benchmark shortest_distance2_t(3.3, 0.3, plg) will still show a single allocation if it is used with out the dollar sign, as in @benchmark shortest_distance2_t(3.3, 0.3, $plg)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  12.292 ns … 25.960 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     12.721 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   13.588 ns Β±  1.339 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–†β–„β–ˆβ–ƒβ–β–β–‚β–β–β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–…β–‚β–ˆβ–„β–β–β–β–β–β–β–β–β–‚β–‚β–‚β–ƒβ–‚β–β–β–β–β–β–β–β– β–‚
  12.3 ns         Histogram: frequency by time        16.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Awesome! I think the type-instability causes much of the mem-allocs I have observed with PProf (I haven’t shown the result). Thank you @Sevi !

1 Like