# Inappropriate memory allocation?

#1

I don’t know why this is happening, but this is what track-allocation is giving me:

``````        - """
-     EuclideanDistance
-
- The Euclidean distance ||x-y||₂
- """
- struct EuclideanDistance <: AbstractDistance end
25732880 (d::EuclideanDistance)(x, y) = norm(x .- y)
``````

I am using dot syntax already, but this function is called some thousands of times. Is memory allocation correct here when the result is a simple scalar?

P.S.: What about creating a new category in Discourse called “Code Performance”? Parallel/Distributed is really not a good fit for this question.

#2

From the same run, why is the line with `view` allocating memory?

``````        - """
-     pairwise(metric, X)
-
- Evaluate `metric` between all n² pairs of columns in a m-by-n matrix `X` efficiently.
- """
- function pairwise(metric::Function, X::AbstractMatrix)
0   m, n = size(X)
26258528   D = zeros(n, n)
0   for j=1:n
14115024     xj = view(X, :, j)
0     for i=j+1:n
63083616       xi = view(X, :, i)
21027872       @inbounds D[i,j] = metric(xi, xj)
-     end
4705008     @inbounds D[j,j] = metric(xj, xj)
0     for i=1:j-1
0       @inbounds D[i,j] = D[j,i] # leveraging the symmetry
-     end
-   end
-
0   D
- end
``````

#3

Yes. `x.-y` allocates a vector for the solution, and then calls norm on that. To make this not allocate you need to do a bit more, like calculate `z = x[i]-y[i]` and accumulate the `z^2` in a loop, etc.

Views to arrays allocate. This is a known missing optimization.

#4

Thanks @ChrisRackauckas, is it reasonable to drop `norm` and unroll the loop?

I will try to pass a generator instead to see if norm is smart about it…

EDIT: it is not

#5

@ChrisRackauckas, what about this one where I am just defining a function, it is allocating more memory than the matrix:

``````        -   # variogram/covariance
0   γ = estimator.γ
51953952   cov(x,y) = γ.sill - γ(x,y)
-
-   # use covariance matrix if possible
0   if isstationary(γ)
478336     Γ = pairwise((x,y) -> cov(x,y), X)
-   else
0     Γ = pairwise((x,y) -> γ(x,y), X)
-   end

``````

`cov(x,y)` is equal to a constant scalar minus another function `gamma(x,y)`.

#6

Are you not clearing the compilation run from the results?

#7

I am, every run is from scratch with a new Julia process:

``````run(`julia --track-allocation=user --inline=yes foo.jl`)
``````

#8

There is something really bad happening:

``````419198208     for j=topleft[2]:bottomright[2], i=topleft[1]:bottomright[1]
549638832       @inbounds neighbors[n] = i + nx*(j-1)
0       n += 1

``````

Why is this for loop allocating memory at all?

#9

This is another one that doesn’t make too much sense to me, I define a path type:

``````        - """
-     RandomPath(domain)
-
- A random path on a spatial `domain`.
- """
- struct RandomPath{D<:AbstractDomain} <: AbstractPath{D}
-   domain::D
-   permut::Vector{Int}
-
-   function RandomPath{D}(domain, permut) where {D<:AbstractDomain}
240240     @assert length(permut) == npoints(domain) "incorrect dimension"
192     new(domain, permut)
-   end
- end
0 RandomPath(domain) = RandomPath{typeof(domain)}(domain, randperm(npoints(domain)))
- Base.start(p::RandomPath)       = Base.start(p.permut)
455472 Base.next(p::RandomPath, state) = Base.next(p.permut, state)
455520 Base.done(p::RandomPath, state) = Base.done(p.permut, state)
``````

and in the loop where I use it, more memory is been eaten:

``````   960000   for location in path
``````

#10

Compiling allocates memory.

#11

Sorry, I forgot to mention that I am taking care of this already, my script looks more or less as follows:

``````solve(problem, solver)
Profile.clear_malloc_data()
solve(problem, solver)
``````

So the compilation goes away.

#12

Then the lines which are allocating which clearly shouldn’t might have a type-instability? That’s the only thing that I can think of that would make `neighbors[n] = i + nx*(j-1)`, allocate assuming each value is a scalar.

#13

I tried checking for type instability with @code_warntype, but the results don’t look very clear to me:

``````Variables:
#self#::GeoStatsBase.#solve
problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}
solver::GeoStats.SeqGaussSim
#175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}
var::Core.Box
V::Any
#temp#@_7::Int64
varreals::Array{_,1} where _
#temp#@_9::Int64
mapper::Core.Box
realizations::Dict{Symbol,Array{Array{T,1} where T,1}}
n::Int64
i::Int64

Body:
begin
mapper::Core.Box = \$(Expr(:new, :(Core.Box)))
NewvarNode(:(realizations::Dict{Symbol,Array{Array{T,1} where T,1}}))
unless \$(Expr(:invoke, MethodInstance for issubset(::Base.KeyIterator{Dict{Symbol,GeoStats.SGSParam}}, ::Base.KeyIterator{Dict{Symbol,DataType}}), :(GeoStats.⊆), :(\$(Expr(:new, Base.KeyIterator{Dict{Symbol,GeoStats.SGSParam}}, :((Core.getfield)(solver, :params)::Dict{Symbol,GeoStats.SGSParam})))), :(\$(Expr(:new, Base.KeyIterator{Dict{Symbol,DataType}}, :((Core.getfield)(problem, :targetvars)::Dict{Symbol,DataType})))))) goto 5
goto 7
5:
(Base.throw)(((Core.getfield)((Core.getfield)(Base.Main, :Base)::Any, :AssertionError)::Any)("invalid variable names in solver parameters")::Any)::Union{}
7:  # line 79:
\$(Expr(:inbounds, false))
# meta: location util.jl warn 585
SSAValue(11) = \$(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Any,1}, svec(Any, Int64), Array{Any,1}, 0, 0, 0))
# meta: pop location
\$(Expr(:inbounds, :pop))
\$(Expr(:invoke, MethodInstance for #warn#521(::Array{Any,1}, ::Function, ::String, ::Vararg{String,N} where N), :(Base.#warn#521), SSAValue(11), :(GeoStats.warn), "SeqGaussSim is not fully implemented")) # line 82:
SSAValue(0) = \$(Expr(:invoke, MethodInstance for GeoStats.SimpleMapper(::GeoStats.GeoDataFrame{DataFrames.DataFrame}, ::GeoStats.RegularGrid{Float64,2}, ::Dict{Symbol,DataType}), :(GeoStats.SimpleMapper), :((Core.getfield)(problem, :spatialdata)::GeoStats.GeoDataFrame{DataFrames.DataFrame}), :((Core.getfield)(problem, :domain)::GeoStats.RegularGrid{Float64,2}), :((Core.getfield)(problem, :targetvars)::Dict{Symbol,DataType})))
(Core.setfield!)(mapper::Core.Box, :contents, SSAValue(0))::GeoStats.SimpleMapper{GeoStats.RegularGrid{Float64,2}} # line 85:
\$(Expr(:inbounds, false))
# meta: location dict.jl Type 104
SSAValue(17) = \$(Expr(:invoke, MethodInstance for fill!(::Array{UInt8,1}, ::UInt8), :(Base.fill!), :(\$(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{UInt8,1}, svec(Any, Int64), Array{UInt8,1}, 0, 16, 0))), :((Base.checked_trunc_uint)(UInt8, 0)::UInt8)))
SSAValue(15) = \$(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Symbol,1}, svec(Any, Int64), Array{Symbol,1}, 0, 16, 0))
SSAValue(13) = \$(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Array{Array{T,1} where T,1},1}, svec(Any, Int64), Array{Array{Array{T,1} where T,1},1}, 0, 16, 0))
# meta: pop location
\$(Expr(:inbounds, :pop))
realizations::Dict{Symbol,Array{Array{T,1} where T,1}} = \$(Expr(:new, Dict{Symbol,Array{Array{T,1} where T,1}}, SSAValue(17), SSAValue(15), SSAValue(13), 0, 0, :((Base.bitcast)(UInt64, (Base.check_top_bit)(0)::Int64)), 1, 0)) # line 88:
SSAValue(1) = (Core.getfield)(problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}, :targetvars)::Dict{Symbol,DataType}
\$(Expr(:inbounds, false))
# meta: location dict.jl start 574
i::Int64 = \$(Expr(:invoke, MethodInstance for skip_deleted(::Dict{Symbol,DataType}, ::Int64), :(Base.skip_deleted), SSAValue(1), :((Core.getfield)(SSAValue(1), :idxfloor)::Int64))) # line 575:
(Core.setfield!)(SSAValue(1), :idxfloor, i::Int64)::Int64
# meta: pop location
\$(Expr(:inbounds, :pop))
#temp#@_9::Int64 = i::Int64
37:
unless (Base.not_int)((Base.slt_int)((Base.arraylen)((Core.getfield)(SSAValue(1), :vals)::Array{DataType,1})::Int64, #temp#@_9::Int64)::Bool)::Bool goto 61
var::Core.Box = \$(Expr(:new, :(Core.Box)))
SSAValue(20) = \$(Expr(:new, Pair{Symbol,DataType}, :((Base.arrayref)((Core.getfield)(SSAValue(1), :keys)::Array{Symbol,1}, #temp#@_9)::Symbol), :((Base.arrayref)((Core.getfield)(SSAValue(1), :vals)::Array{DataType,1}, #temp#@_9)::DataType)))
SSAValue(21) = \$(Expr(:invoke, MethodInstance for skip_deleted(::Dict{Symbol,DataType}, ::Int64), :(Base.skip_deleted), SSAValue(1), :((Base.add_int)(#temp#@_9, 1)::Int64)))
SSAValue(3) = SSAValue(20)
SSAValue(22) = (Base.getfield)(SSAValue(3), 1)::Symbol
SSAValue(5) = SSAValue(22)
(Core.setfield!)(var::Core.Box, :contents, SSAValue(5))::Symbol
SSAValue(24) = (Base.getfield)(SSAValue(3), 2)::DataType
#temp#@_9::Int64 = SSAValue(21) # line 89:
#175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim} = \$(Expr(:new, GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}, :(problem), :(solver), :(var), :(mapper)))
SSAValue(18) = (Core.getfield)(problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}, :nreals)::Int64
SSAValue(8) = \$(Expr(:new, UnitRange{Int64}, 1, :((Base.select_value)((Base.sle_int)(1, SSAValue(18))::Bool, SSAValue(18), (Base.sub_int)(1, 1)::Int64)::Int64)))
SSAValue(9) = \$(Expr(:new, Base.Generator{UnitRange{Int64},GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}}, :(#175), SSAValue(8)))
varreals::Array{_,1} where _ = \$(Expr(:invoke, MethodInstance for collect(::Base.Generator{UnitRange{Int64},GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}}), :(Base.collect), SSAValue(9))) # line 91:
SSAValue(19) = ((Core.getfield)(var::Core.Box, :contents)::Any => varreals::Array{_,1} where _)::Pair{_,_} where _ where _
(Base.setindex!)(realizations::Dict{Symbol,Array{Array{T,1} where T,1}}, (Core.getfield)(SSAValue(19), :second)::Any, (Core.getfield)(SSAValue(19), :first)::Any)::Dict{Symbol,Array{Array{T,1} where T,1}}
59:
goto 37
61:  # line 95:
return \$(Expr(:new, GeoStatsBase.SimulationSolution{GeoStats.RegularGrid{Float64,2}}, :((Core.getfield)(problem, :domain)::GeoStats.RegularGrid{Float64,2}), :(realizations)))
end::GeoStatsBase.SimulationSolution{GeoStats.RegularGrid{Float64,2}}
``````

#14

The function is this one in case you can spot some type instability:

https://github.com/juliohm/GeoStats.jl/blob/master/src/neighborhoods/cube_neighborhood.jl#L34-L87

#15
``````V::Any
varreals::Array{_,1} where _
realizations::Dict{Symbol,Array{Array{T,1} where T,1}}
``````

Those are non-strictly typed containers which will not infer upon `getindex`. Additionally,

``````var::Core.Box
mapper::Core.Box
``````

those are suggestive that its not being inferred (though there are, or at least used to be, places where `Core.Box` was actually inferred I think, which showed up in the closure issue). However, together `@code_warntype` is telling you that the function is not type-inferrable.

#16

I am trying to understand the `Variables` section of the output, could you please help disgest it?

``````Variables:
#self#::GeoStatsBase.#solve
problem::GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}}
solver::GeoStats.SeqGaussSim
#175::GeoStats.##175#176{GeoStatsBase.SimulationProblem{GeoStats.GeoDataFrame{DataFrames.DataFrame},GeoStats.RegularGrid{Float64,2}},GeoStats.SeqGaussSim}
var::Core.Box
V::Any
#temp#@_7::Int64
varreals::Array{_,1} where _
#temp#@_9::Int64
mapper::Core.Box
realizations::Dict{Symbol,Array{Array{T,1} where T,1}}
n::Int64
i::Int64
``````

The function is called `solve` and we can find the name at the top in the `self`. The arguments are `problem` and `solver`, also shown there. Then, something weird follows the arguments, what is it?

#17

Those are the other variables in the function, and the `::` is their inferred type.

#18

So the main issue here is with the `mapper`, I don’t know why though Julia cannot figure out its type.

The rest as well, the variable V should hold a simple DataType.

#19

One thing I noticed: radius is untyped

If you provide some short runnable code, it might be easier to look into it.

#20

Yes @cstjean, I think in order to give a type to the radius in Julia v0.6, I would have to add an extra type parameter to the neighborhood, which I was trying to avoid. In Julia v0.7 that won’t be a problem. If you can do some experiments, that would be awesome. The code is simple:

``````using GeoStats
using DataFrames

# stations monitoring rainfall
df = csv"""
x,    y,       station, precipitation
25.0, 25.0,     palo alto,           1.0
50.0, 75.0,  redwood city,           0.0
75.0, 50.0, mountain view,           1.0
"""

geodata = GeoDataFrame(df, [:x,:y])
domain = RegularGrid{Float64}(100,100)
problem = SimulationProblem(geodata, domain, :precipitation, 3)

solver = SeqGaussSim(