Inappropriate memory allocation?

I don’t know why this is happening, but this is what track-allocation is giving me:

        - """
        -     EuclideanDistance
        - 
        - The Euclidean distance ||x-y||₂
        - """
        - struct EuclideanDistance <: AbstractDistance end
 25732880 (d::EuclideanDistance)(x, y) = norm(x .- y)

I am using dot syntax already, but this function is called some thousands of times. Is memory allocation correct here when the result is a simple scalar?

P.S.: What about creating a new category in Discourse called “Code Performance”? Parallel/Distributed is really not a good fit for this question.

From the same run, why is the line with view allocating memory?

        - """
        -     pairwise(metric, X)
        - 
        - Evaluate `metric` between all n² pairs of columns in a m-by-n matrix `X` efficiently.
        - """
        - function pairwise(metric::Function, X::AbstractMatrix)
        0   m, n = size(X)
 26258528   D = zeros(n, n)
        0   for j=1:n
 14115024     xj = view(X, :, j)
        0     for i=j+1:n
 63083616       xi = view(X, :, i)
 21027872       @inbounds D[i,j] = metric(xi, xj)
        -     end
  4705008     @inbounds D[j,j] = metric(xj, xj)
        0     for i=1:j-1
        0       @inbounds D[i,j] = D[j,i] # leveraging the symmetry
        -     end
        -   end
        - 
        0   D
        - end

Yes. x.-y allocates a vector for the solution, and then calls norm on that. To make this not allocate you need to do a bit more, like calculate z = x[i]-y[i] and accumulate the z^2 in a loop, etc.

Views to arrays allocate. This is a known missing optimization.

1 Like

Thanks @ChrisRackauckas, is it reasonable to drop norm and unroll the loop?

I will try to pass a generator instead to see if norm is smart about it…

EDIT: it is not :frowning:

@ChrisRackauckas, what about this one where I am just defining a function, it is allocating more memory than the matrix:

        -   # variogram/covariance
        0   γ = estimator.γ
 51953952   cov(x,y) = γ.sill - γ(x,y)
        - 
        -   # use covariance matrix if possible
        0   if isstationary(γ)
   478336     Γ = pairwise((x,y) -> cov(x,y), X)
        -   else
        0     Γ = pairwise((x,y) -> γ(x,y), X)
        -   end

cov(x,y) is equal to a constant scalar minus another function gamma(x,y).

Are you not clearing the compilation run from the results?

I am, every run is from scratch with a new Julia process:

run(`julia --track-allocation=user --inline=yes foo.jl`)

There is something really bad happening:

419198208     for j=topleft[2]:bottomright[2], i=topleft[1]:bottomright[1]
549638832       @inbounds neighbors[n] = i + nx*(j-1)
        0       n += 1

Why is this for loop allocating memory at all?

This is another one that doesn’t make too much sense to me, I define a path type:

        - """
        -     RandomPath(domain)
        - 
        - A random path on a spatial `domain`.
        - """
        - struct RandomPath{D<:AbstractDomain} <: AbstractPath{D}
        -   domain::D
        -   permut::Vector{Int}
        - 
        -   function RandomPath{D}(domain, permut) where {D<:AbstractDomain}
   240240     @assert length(permut) == npoints(domain) "incorrect dimension"
      192     new(domain, permut)
        -   end
        - end
        0 RandomPath(domain) = RandomPath{typeof(domain)}(domain, randperm(npoints(domain)))
        - Base.start(p::RandomPath)       = Base.start(p.permut)
   455472 Base.next(p::RandomPath, state) = Base.next(p.permut, state)
   455520 Base.done(p::RandomPath, state) = Base.done(p.permut, state)

and in the loop where I use it, more memory is been eaten:

   960000   for location in path

Compiling allocates memory.

Sorry, I forgot to mention that I am taking care of this already, my script looks more or less as follows:

solve(problem, solver)
Profile.clear_malloc_data()
solve(problem, solver)

So the compilation goes away.

Then the lines which are allocating which clearly shouldn’t might have a type-instability? That’s the only thing that I can think of that would make neighbors[n] = i + nx*(j-1), allocate assuming each value is a scalar.

I tried checking for type instability with @code_warntype, but the results don’t look very clear to me:

Variables:
  #self#::GeoStatsBase.#solve
  problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}
  solver::GeoStats.SeqGaussSim
  #175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}
  var::Core.Box
  V::Any
  #temp#@_7::Int64
  varreals::Array{_,1} where _
  #temp#@_9::Int64
  mapper::Core.Box
  realizations::Dict{Symbol,Array{Array{T,1} where T,1}}
  n::Int64
  i::Int64

Body:
  begin 
      mapper::Core.Box = $(Expr(:new, :(Core.Box)))
      NewvarNode(:(realizations::Dict{Symbol,Array{Array{T,1} where T,1}}))
      unless $(Expr(:invoke, MethodInstance for issubset(::Base.KeyIterator{Dict{Symbol,GeoStats.SGSParam}}, ::Base.KeyIterator{Dict{Symbol,DataType}}), :(GeoStats.⊆), :($(Expr(:new, Base.KeyIterator{Dict{Symbol,GeoStats.SGSParam}}, :((Core.getfield)(solver, :params)::Dict{Symbol,GeoStats.SGSParam})))), :($(Expr(:new, Base.KeyIterator{Dict{Symbol,DataType}}, :((Core.getfield)(problem, :targetvars)::Dict{Symbol,DataType})))))) goto 5
      goto 7
      5: 
      (Base.throw)(((Core.getfield)((Core.getfield)(Base.Main, :Base)::Any, :AssertionError)::Any)("invalid variable names in solver parameters")::Any)::Union{}
      7:  # line 79:
      $(Expr(:inbounds, false))
      # meta: location util.jl warn 585
      SSAValue(11) = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), Array{Any,1}, 0, 0, 0))
      # meta: pop location
      $(Expr(:inbounds, :pop))
      $(Expr(:invoke, MethodInstance for #warn#521(::Array{Any,1}, ::Function, ::String, ::Vararg{String,N} where N), :(Base.#warn#521), SSAValue(11), :(GeoStats.warn), "SeqGaussSim is not fully implemented")) # line 82:
      SSAValue(0) = $(Expr(:invoke, MethodInstance for GeoStats.SimpleMapper(::GeoStats.GeoDataFrame{DataFrames.DataFrame}, ::GeoStats.RegularGrid{Float64,2}, ::Dict{Symbol,DataType}), :(GeoStats.SimpleMapper), :((Core.getfield)(problem, :spatialdata)::GeoStats.GeoDataFrame{DataFrames.DataFrame}), :((Core.getfield)(problem, :domain)::GeoStats.RegularGrid{Float64,2}), :((Core.getfield)(problem, :targetvars)::Dict{Symbol,DataType})))
      (Core.setfield!)(mapper::Core.Box, :contents, SSAValue(0))::GeoStats.SimpleMapper{GeoStats.RegularGrid{Float64,2}} # line 85:
      $(Expr(:inbounds, false))
      # meta: location dict.jl Type 104
      SSAValue(17) = $(Expr(:invoke, MethodInstance for fill!(::Array{UInt8,1}, ::UInt8), :(Base.fill!), :($(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{UInt8,1}, svec(Any, Int64), Array{UInt8,1}, 0, 16, 0))), :((Base.checked_trunc_uint)(UInt8, 0)::UInt8)))
      SSAValue(15) = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Symbol,1}, svec(Any, Int64), Array{Symbol,1}, 0, 16, 0))
      SSAValue(13) = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Array{Array{T,1} where T,1},1}, svec(Any, Int64), Array{Array{Array{T,1} where T,1},1}, 0, 16, 0))
      # meta: pop location
      $(Expr(:inbounds, :pop))
      realizations::Dict{Symbol,Array{Array{T,1} where T,1}} = $(Expr(:new, Dict{Symbol,Array{Array{T,1} where T,1}}, SSAValue(17), SSAValue(15), SSAValue(13), 0, 0, :((Base.bitcast)(UInt64, (Base.check_top_bit)(0)::Int64)), 1, 0)) # line 88:
      SSAValue(1) = (Core.getfield)(problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}, :targetvars)::Dict{Symbol,DataType}
      $(Expr(:inbounds, false))
      # meta: location dict.jl start 574
      i::Int64 = $(Expr(:invoke, MethodInstance for skip_deleted(::Dict{Symbol,DataType}, ::Int64), :(Base.skip_deleted), SSAValue(1), :((Core.getfield)(SSAValue(1), :idxfloor)::Int64))) # line 575:
      (Core.setfield!)(SSAValue(1), :idxfloor, i::Int64)::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      #temp#@_9::Int64 = i::Int64
      37: 
      unless (Base.not_int)((Base.slt_int)((Base.arraylen)((Core.getfield)(SSAValue(1), :vals)::Array{DataType,1})::Int64, #temp#@_9::Int64)::Bool)::Bool goto 61
      var::Core.Box = $(Expr(:new, :(Core.Box)))
      SSAValue(20) = $(Expr(:new, Pair{Symbol,DataType}, :((Base.arrayref)((Core.getfield)(SSAValue(1), :keys)::Array{Symbol,1}, #temp#@_9)::Symbol), :((Base.arrayref)((Core.getfield)(SSAValue(1), :vals)::Array{DataType,1}, #temp#@_9)::DataType)))
      SSAValue(21) = $(Expr(:invoke, MethodInstance for skip_deleted(::Dict{Symbol,DataType}, ::Int64), :(Base.skip_deleted), SSAValue(1), :((Base.add_int)(#temp#@_9, 1)::Int64)))
      SSAValue(3) = SSAValue(20)
      SSAValue(22) = (Base.getfield)(SSAValue(3), 1)::Symbol
      SSAValue(23) = (Base.add_int)(1, 1)::Int64
      SSAValue(5) = SSAValue(22)
      (Core.setfield!)(var::Core.Box, :contents, SSAValue(5))::Symbol
      SSAValue(24) = (Base.getfield)(SSAValue(3), 2)::DataType
      SSAValue(25) = (Base.add_int)(2, 1)::Int64
      #temp#@_9::Int64 = SSAValue(21) # line 89:
      #175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim} = $(Expr(:new, GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}, :(problem), :(solver), :(var), :(mapper)))
      SSAValue(18) = (Core.getfield)(problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}, :nreals)::Int64
      SSAValue(8) = $(Expr(:new, UnitRange{Int64}, 1, :((Base.select_value)((Base.sle_int)(1, SSAValue(18))::Bool, SSAValue(18), (Base.sub_int)(1, 1)::Int64)::Int64)))
      SSAValue(9) = $(Expr(:new, Base.Generator{UnitRange{Int64},GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}}, :(#175), SSAValue(8)))
      varreals::Array{_,1} where _ = $(Expr(:invoke, MethodInstance for collect(::Base.Generator{UnitRange{Int64},GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}}), :(Base.collect), SSAValue(9))) # line 91:
      SSAValue(19) = ((Core.getfield)(var::Core.Box, :contents)::Any => varreals::Array{_,1} where _)::Pair{_,_} where _ where _
      (Base.setindex!)(realizations::Dict{Symbol,Array{Array{T,1} where T,1}}, (Core.getfield)(SSAValue(19), :second)::Any, (Core.getfield)(SSAValue(19), :first)::Any)::Dict{Symbol,Array{Array{T,1} where T,1}}
      59: 
      goto 37
      61:  # line 95:
      return $(Expr(:new, GeoStatsBase.SimulationSolution{GeoStats.RegularGrid{Float64,2}}, :((Core.getfield)(problem, :domain)::GeoStats.RegularGrid{Float64,2}), :(realizations)))
  end::GeoStatsBase.SimulationSolution{GeoStats.RegularGrid{Float64,2}}

The function is this one in case you can spot some type instability:

https://github.com/juliohm/GeoStats.jl/blob/master/src/neighborhoods/cube_neighborhood.jl#L34-L87

V::Any
varreals::Array{_,1} where _
realizations::Dict{Symbol,Array{Array{T,1} where T,1}}

Those are non-strictly typed containers which will not infer upon getindex. Additionally,

var::Core.Box
mapper::Core.Box

those are suggestive that its not being inferred (though there are, or at least used to be, places where Core.Box was actually inferred I think, which showed up in the closure issue). However, together @code_warntype is telling you that the function is not type-inferrable.

2 Likes

I am trying to understand the Variables section of the output, could you please help disgest it?

Variables:
  #self#::GeoStatsBase.#solve
  problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}
  solver::GeoStats.SeqGaussSim
  #175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}
  var::Core.Box
  V::Any
  #temp#@_7::Int64
  varreals::Array{_,1} where _
  #temp#@_9::Int64
  mapper::Core.Box
  realizations::Dict{Symbol,Array{Array{T,1} where T,1}}
  n::Int64
  i::Int64

The function is called solve and we can find the name at the top in the self. The arguments are problem and solver, also shown there. Then, something weird follows the arguments, what is it?

Those are the other variables in the function, and the :: is their inferred type.

So the main issue here is with the mapper, I don’t know why though Julia cannot figure out its type.

The rest as well, the variable V should hold a simple DataType.

One thing I noticed: radius is untyped

If you provide some short runnable code, it might be easier to look into it.

1 Like

Yes @cstjean, I think in order to give a type to the radius in Julia v0.6, I would have to add an extra type parameter to the neighborhood, which I was trying to avoid. In Julia v0.7 that won’t be a problem. If you can do some experiments, that would be awesome. The code is simple:

using GeoStats
using DataFrames

# stations monitoring rainfall
df = csv"""
   x,    y,       station, precipitation
25.0, 25.0,     palo alto,           1.0
50.0, 75.0,  redwood city,           0.0
75.0, 50.0, mountain view,           1.0
"""

geodata = GeoDataFrame(df, [:x,:y])
domain = RegularGrid{Float64}(100,100)
problem = SimulationProblem(geodata, domain, :precipitation, 3)

solver = SeqGaussSim(
    :precipitation => @NT(variogram=SphericalVariogram(range=20.), neighradius=10.)
)

@code_warntype solve(problem, solver)

If you can see significant speed-ups or memory savings after adding the type, please share here :slight_smile: