GC occurs at the worst time in tight loop (Garbage Collection)

Allan_Baker · October 15, 2023, 2:46pm

I had not heard of JET.jl. I’ll check it out.

Allan_Baker · October 15, 2023, 2:53pm

There is also a lot of good information in this thread. It helps me feel out if what I am seeing is expected or not.

Allan_Baker · October 15, 2023, 3:02pm

So what is the correct way to allocate the array from C, pass the pointer to Julia, and copy the pre-existing Ref’d Julia array to the memory defined for the C-pointer. I thought it was an unsafe_wrap. But the only reason I do that is to manifest the copy! from one julia array to the C-array now in julia. If it is causing an allocation, I don’t want that.

vchuravy · October 15, 2023, 3:38pm

One Julia 1.10 you could use Libc.memcpy directly.

arr = Array{Int}(undef, 1024)
Libc.memcpy(arr, cptr, sizeof(arr))

function memcpy(dst::Ptr, src::Ptr, n::Integer)
    ccall(:memcpy, Ptr{Cvoid}, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), dst, src, n)
end

Allan_Baker · October 15, 2023, 3:56pm

on code instability. Does using @error macros to annotate to the screen a problem before returning an error code to the calling function cause any kind of type instability or allocations even though that path is not chosen?

Also, I have notice that on my home-brewed quaternions and xyz-vector classes, my normalization calls tend to allocate now even though it is a immuatable composite type. I do a divide by zero protection in there. I thought I protected for type instability but I may not have. On one of them, I think I will just throw a domain error. Could the option to throw a domain error instead of returning a normalized quaternion, cause type instability?

function normalize(q::quaternion{T}) where T<:QUATTYPE
    n = norm(q)
    n > eps(T) ? quaternion{T}(q.s / n, q.v / n) : throw(DomainError(q,"Normalizing Null Quaternion"))
end

norm(q::quaternion) = sqrt(q.s^2 + q.v.x^2 + q.v.y^2 + q.v.z^2)

Also, I recently expanded QUATTYPE to be const QUATTYPE=Number
to allow it to work with Symbolics. Right now it is being defined as a Float64.

I didn’t think the normalize calls used to allocate when I first designed the class. But as I’ve been hacking at it for symbolics and other things, I may have introduced an error.

I think I also noticed other strange things, like accessing my vector as v.x was more expensive to @allocated on the repel than v[1] was. v.x = 64 bytes of allocation, and v[1] is 16. The v.x, calls getfield according to @code_warntype. My v[1] just runs through an if-elseif-- type structure with getindex to return the getproperty(v,:x).

So that is also confusing to me. I need to check the jet. I think these immutable composite type manipulations causing allocations unexpectedly is part of the problem, and I wonder if I have broken something trying to make them work with Symbolics or some other patch along the ways or if this is because I’m evaluating on the REPL and inside a compiled function, it wouldn’t have allocated.

I’ll try JET.jl later today. I think I must have some instability in my xyzvector type (which is immutable struct) caused by my recent chages to optimize it and make it work for symbolics better. In the past, it used to be a “Real” parent type, but now it is a “Number”.


abstract type AbstractVec{T<:Number} end

" xyzVecT is the object definition to contain the cartesian information "
struct xyzVecT{T} <:AbstractVec{T}
    x::T
    y::T
    z::T
end

# Consturctors
(::Type{xyzVecT{T}})(p::Tuple{Number,Number,Number}) where {T<:Number} = xyzVecT{T}(p...)
(::Type{xyzVecT{T}})() where {T<:Number} = xyzVecT{T}(convert(T,0),convert(T,0),convert(T,0))
(::Type{xyzVecT{T}})(v::AbstractArray{Any,1}) where{T<:Number} = length(v)==0 ? xyzVecT{T}(convert(T,0),convert(T,0),convert(T,0)) : xyzVecT{T}(convert(T,v[1]),convert(T,v[2]),convert(T,v[3]))
xVec(x::T) where {T<:Number} = xyzVecT{T}(x,zero(x),zero(x)) # To help specify unit vectors
yVec(y::T) where {T<:Number} = xyzVecT{T}(zero(y),y,zero(y)) # To help specify unit vectors
zVec(z::T) where {T<:Number} = xyzVecT{T}(zero(z),zero(z),z) # To help specify unit vectors
(::Type{xyzVecT{T}})(v::AbstractArray) where {T<:Number} = xyzVecT{T}(convert(T,v[1]),convert(T,v[2]),convert(T,v[3]))
(::Type{xyzVecT{T}})(v::xyzVecT{T}) where {T<:Number} = v
(::Type{xyzVecT{T}})(v::xyzVecT) where {T<:Number} = xyzVecT{T}(v.x,v.y,v.z)
xyzVecT(x::T,y::T,z::T) where T<:Number = xyzVecT{T}(x,y,z)  # This is needed to stop the stack overflowing from 
xyzVecT(x::Number,y::Number,z::Number) = xyzVecT(promote(x,y,z)...) # this.
xyzVecT() = xyzVecT{Float64}(0.0,0.0,0.0) 
xyzVecT(v::AbstractArray)  = xyzVecT(v[1],v[2],v[3])

[there is more]

This was one of the first things I wrote when learning Julia, so I’m sure it is full of problems still.

Yeah, I just did a test, and defining the quaternion, which uses the xyzVecT as a subclass, is not allocating when I initialize it with 4 scalars, but when I initialize it with a scalar and a V it is allocating (this is the default constructor). It may be calling the normalize.

const QUATTYPE=Number
struct quaternion{T<:QUATTYPE} <:Number
    s::T
    v::xyzVecT{T}
end

quaternion(s::T, v1::T, v2::T, v3::T) where T<:QUATTYPE = quaternion{T}(s, xyzVecT{T}(v1, v2, v3))
#quaternion(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) = quaternion(promote(s,v1,v2,v3)...)  # Get everything to the same type
quaternion(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) = quaternion(promote(s,v1,v2,v3)...)  # Get everything to the same type
quaternion{T}(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) where T<:QUATTYPE = quaternion{T}(convert(T,s), xyzVecT{T}(v1, v2, v3))
quaternion() = quaternion{Float64}(1.0, xyzVecT{Float64}())
quaternion{T}() where T<:QUATTYPE = quaternion{T}(1.0, xyzVecT{T}())
quaternion(v::xyzVecT{T}) where T<:QUATTYPE = quaternion{T}(convert(T,0), v)
quaternion(s::T) where T<:QUATTYPE = quaternion{T}(s, xyzVecT{T}())
quaternion(a::AbstractArray{T,1}) where T<:QUATTYPE = quaternion{T}(a[1], xyzVecT{T}(a[2], a[3], a[4]))

Very strange, I have a lot to learn.

julia> @allocated q=quaternion{Float64}(1.0,2.0,3.0,4.0)
0

julia> @allocated q=quaternion{Float64}(1.0,v)
48

julia> @code_warntype quaternion{Float64}(1.0,v)
MethodInstance for quaternion{Float64}(::Float64, ::xyzVecT{Float64})
  from (var"#ctor-self#"::Type{quaternion{T}} where T<:Number)(s, v) @ LMQuaternions C:\Users\bakerar\.julia\packages\LMQuaternions\vxMqo\src\LMQuaternions.jl:68
Arguments
  #ctor-self#::Core.Const(quaternion{Float64})
  s::Float64
  v::xyzVecT{Float64}
Body::quaternion{Float64}
1 ─ %1 = Core.fieldtype(#ctor-self#, 1)::Core.Const(Float64)
│   %2 = Base.convert(%1, s)::Float64
│   %3 = Core.fieldtype(#ctor-self#, 2)::Core.Const(xyzVecT{Float64})
│   %4 = Base.convert(%3, v)::xyzVecT{Float64}
│   %5 = %new(#ctor-self#, %2, %4)::quaternion{Float64}
└──      return %5


julia> @code_warntype quaternion{Float64}(1.0,2.0,3.0,4.0)
MethodInstance for quaternion{Float64}(::Float64, ::Float64, ::Float64, ::Float64)
  from quaternion{T}(s::Number, v1::Number, v2::Number, v3::Number) where T<:Number @ LMQuaternions C:\Users\bakerar\.julia\packages\LMQuaternions\vxMqo\src\LMQuaternions.jl:75
Static Parameters
  T = Float64
Arguments
  #self#::Core.Const(quaternion{Float64})
  s::Float64
  v1::Float64
  v2::Float64
  v3::Float64
Body::quaternion{Float64}
1 ─ %1 = Core.apply_type(LMQuaternions.quaternion, $(Expr(:static_parameter, 1)))::Core.Const(quaternion{Float64})
│   %2 = LMQuaternions.convert($(Expr(:static_parameter, 1)), s)::Float64
│   %3 = Core.apply_type(LMQuaternions.xyzVecT, $(Expr(:static_parameter, 1)))::Core.Const(xyzVecT{Float64})
│   %4 = (%3)(v1, v2, v3)::xyzVecT{Float64}
│   %5 = (%1)(%2, %4)::quaternion{Float64}
└──      return %5

nsajko · October 15, 2023, 5:06pm

AFAIK printing does cause run time dispatch, at least like it’s implemented in the standard library. If the path isn’t chosen, there will be no run-time dispatch, so it’s not problematic from a performance perspective, though. You’ll probably get extra noise from JET because of the prints, because JET does static analysis, so it doesn’t care whether the prints are actually executed.

Regarding the rest of your message, try creating a self-contained example. For example, you (rightfully) complain that quaternion{Float64}(1.0,v) allocates, but then you didn’t provide the source for that constructor, as far as I see.

lmiq · October 15, 2023, 6:01pm

Are you aware of StatiArrays.jl ? It implements all of that.

Allan_Baker · October 16, 2023, 1:18am

I don’t think it was a thing when I started that vector type, but I was trying to do something equivalent by staying in static memory. I also know I only need 3 and 2 element vectors for most of the stuff. I’ve looked at swapping it out before, but it always seemed to allocate more or run slower than what I had every time I messed with it. Maybe one-day I’ll switch it out under the hood when I’ve had a chance to use it more. Also, not relying on external packages that could potentially change a lot and that might also have licenses and other legal things I need to be concerned with can be beneficial.

Allan_Baker · October 16, 2023, 1:19am

Yeah, I think it must be the default constructor.

Allan_Baker · October 16, 2023, 1:53am

It looks like try-catch block dynamically dispatches according to Jet as well as the logging. I removed all the logging to make sure. Otherwise I don’t see any dynamic dispatches. I think I’ll take out the try-catch and do the direct memory copy for one of the functions.

Allan_Baker · October 16, 2023, 4:22am

I tried to do this, but it is not correct. The memory get corrupted.
I am doing tests where I am calling this wholly in Julia right now, so it is a Julia generated pointer in a test function. This used to give the correct answer when I was using the unsafe_wrap which had the allocations I was trying to get rid of. The memory copy seems to start several values over in the array.


#in module
const outputRef_db_mag=Ref{Vector{Float64}}()
const outputRef_deg_phase=Ref{Vector{Float64}}()

# in setup function
# number is set by another data reader

outputRef_db_mag[]=Vector{Float64}(undef,number)
outputRef_deg_phase[]=Vector{Float64}(undef,number)

# in the calc function (where db_mag and deg_phase are separately calculated by another routine for this example)
map!(m->(voltage2db(abs(m))),outputRef_db_mag[],intermediateRef_magphase[])
map!(p->(angle_deg(p)),outputRef_deg_phase[],intermediateRef_magphase[])

function memcpy(dst::Ptr, src::Ptr, n::Integer)
    ccall(:memcpy, Ptr{Cvoid}, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), dst, src, n)
end

# The Function called from C
Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
    if isCalculationValid[]
        try
            outlen=length(outputRef_db_mag[])
            if outlen == len
                memcpy(cmag, pointer_from_objref(outputRef_db_mag[]), sizeof(Cdouble)*len)
                memcpy(cphase, pointer_from_objref(outputRef_deg_phase[]), sizeof(Cdouble)*len)
            else
                #@error "Pointer to Array of length $outlen is required! Function called with len=$len."
                return Cint(1)
            end    
        catch err
            #@error "Problem copying arrays from Julia to C: $(err.msg)"
            Base.invokelatest(Base.display_error, Base.catch_stack())
            return Cint(2)
        end
        return Cint(0)
    else
        #@error "Results need recalculating first."
        return Cint(3)
    end

end

vchuravy · October 16, 2023, 4:38am

pointer_from_objref gives you the pointer to the Array object not the data itself. What you want is pointer

You could also not strongly type the memcpy and have the ccall do the conversion for you correctly.

Allan_Baker · October 16, 2023, 4:55am

This appears to work with zero allocations.

Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
    if isCalculationValid[]
        try
            outlen=length(outputRef_db_mag[])
            if outlen == len
                for ii=1:outlen
                    unsafe_store!(cmag, outputRef_db_mag[][ii],ii)
                    unsafe_store!(cphase, outputRef_deg_phase[][ii],ii)
                end
            else
                #@error "Pointer to Array of length $outlen is required! Function called with len=$len."
                return Cint(1)
            end    
        catch err
            #@error "Problem copying arrays from Julia to C: $(err.msg)"
            Base.invokelatest(Base.display_error, Base.catch_stack())
            return Cint(2)
        end
        return Cint(0)
    else
        #@error "Results need recalculating first."
        return Cint(3)
    end

end

Hopefully this is an ok solution.

vchuravy · October 16, 2023, 3:05pm

I think you could simplify to:

@noinline function report_error(outlen, len)
    @error "Pointer to Array of length $outlen is required! Function called with len=$len."
end 

@noinline function report_recalculating()
    @error "Results need recalculating first."
end

Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
    if isCalculationValid[]
        db_mag = outputRef_db_mag[]
        deg_phase = outputRef_deg_phase[]
        outlen = db_mag
        if outlen != len
            report_error(outlen, len)
            return Cint(1)
        end
        @inbounds for ii=1:outlen
            unsafe_store!(cmag, db_mag[ii], ii)
            unsafe_store!(cphase, deg_phase[ii], ii)
        end
        return Cint(0)
    else
        report_recalculating()
        return Cint(3)
    end
end

vchuravy · October 16, 2023, 3:08pm

Alternativly. If you are not doing double buffering or something in the C side.

You could “just” Base.unsafe_wrap(Array, cmag; own=false) and use that Array for outputRef_db_mag.

Then your ccallable collapses down to isCalculationValid and you don’t do the data movement.

Allan_Baker · October 17, 2023, 1:14am

This has some really cool insight in it. Thanks!

Allan_Baker · October 17, 2023, 1:15am

I need think a bit on this. I’m not sure I understand it, but what I have is not allocating at this point… there… it goes back to the data access I originally proposed.

Allan_Baker · October 17, 2023, 1:58am

Ok. State of where I am.
I got rid of all allocations except for the original function call in the original message that accessed the 4D array from the struct. I tried making that struct mutable and it did not matter.

I reworked my code to get rid of the ambiguities at the line.
The allocations are coming from accessing the obj.data array.

This is a read operation. There are no other allocations in any of the functions, and the amount of memory being allocated adds up to all of the calls to these variables.

There was some conversation earlier in this thread, but I couldn’t find the conclusion in all of the discussion.

Here is the trace for the allocations… where I have edited some of the function and variable names just to keep it generic.

0         for ii=1:number
     1024             a=obj.data[ii,j0,k0,ifreq]
     1024             b=obj.data[ii,j1,k0,ifreq]
     1024             c=obj.data[ii,j1,k1,ifreq]
     1024             d=obj.data[ii,j0,k1,ifreq]
        0             result[ii]=fnc_inlined(a,
        -                             b,
        -                             c,
        -                             d,
        -                             t,u)

If you already told me why this make sense, can you tell me again?

While I ponder what is going on here… I have disabled GC right before this line and re-enabled right after. This removes all garbage collection from the @benchmark’s even for a function that calls this 1e6 times.

To make it sustainable, I guess I need to create a little C-call function that refreshes the GC at an idle time, or after the read of the data.

I’m wondering, if I made a, b, c, and d, Ref’s, would that cure all my allocations. It seems a bit hacky.

Thanks again for the tremendous support.

vchuravy · October 17, 2023, 2:20am

Try using the allocation profiler instad of --trace-allocations

Do you have a standalone MWE for that? Would make it easier to show how I analyze were allocations are coming from.

Allan_Baker · October 18, 2023, 2:27pm

Here is my attempt at a minimally working example. It is strange. When I have done the immutable stuctures in the past, I did not think that there was any allocation and I have lots of functions it seems that do this, but when I make an example, it seems to have similar problems to what I am seeing. Now the + seems to be allocating on the heap. Anyone know where my knowledge gap is?

#allocation test

struct mystruct1{T<:Real}
	a::T
	b::T
	c::T
	end

	@inline Base.:+(a::mystruct1{T},b::mystruct1{T}) where T<:Real=mystruct1{T}(a.a+b.a,a.b+b.b,a.c+b.c)    

	f=[mystruct1{Float64}(i,j,k) for i=1:10,j=1:10,k=1:3];

    x=mystruct1{Float64}(1.0,2.0,3.0)
    y=mystruct1{Float64}(4.0,5.0,6.0)
    
    x+y

    @show @allocated x+y 

    c=Array{mystruct1{Float64}}(undef,1)

    c[1]=x+y
    @show @allocated c[1]=x+y

    @show @allocated c[1]=f[1,1,1]+f[1,2,3]

    


    a=10.0
    b=20.0
    g=[1.0]

    @show @allocated a+b

    @show @allocated g[1]=a+b

    @inline function dostuff(x1,x2)
        x1+x2
    end

    dostuff(a,b)
    dostuff(x,y)
    dostuff(f[1,1,1],f[1,2,3])

    @show @allocated dostuff(a,b)
   @show @allocated dostuff(f[1,1,1],f[1,2,3])

Results of include

include("allocation_test.jl")
#= allocation_test.jl:18 =# @allocated(x + y) = 32
#= allocation_test.jl:23 =# @allocated(c[1] = x + y) = 32
#= allocation_test.jl:25 =# @allocated(c[1] = f[1, 1, 1] + f[1, 2, 3]) = 96
#= allocation_test.jl:34 =# @allocated(a + b) = 16
#= allocation_test.jl:36 =# @allocated(g[1] = a + b) = 16
#= allocation_test.jl:46 =# @allocated(dostuff(a, b)) = 16
#= allocation_test.jl:47 =# @allocated(dostuff(f[1, 1, 1], f[1, 2, 3])) = 96

Topic		Replies	Views
Garbage collection New to Julia question	10	5582	June 15, 2020
Performance of garbage collection Performance garbage-collection	5	2485	September 9, 2021
Garbage collector, C malloc and performance Performance question	2	988	March 19, 2021
Large Garbage Collection in for loop if slicing vector Performance	10	560	March 22, 2022
Understanding GC time Performance question	1	99	March 19, 2025

GC occurs at the worst time in tight loop (Garbage Collection)

Related topics