GC occurs at the worst time in tight loop (Garbage Collection)

If you can collect a profile with perf that may help answer those questions

Random suspicion do you have https://github.com/JuliaLang/julia/blob/01f6c4c8b86f62af923a600327f9b6be1a905193/test/embedding/embedding.c#L7 set?

1 Like

What does JULIA_DEFINE_FAST_TLS do again?

Ok. Thank you. We will try to see what this does.

It makes sure that the task local storage access Julia does is the fastest possible, if your application doesn’t have that you can see real performance regression on Linux

One other though. Is there a way to figure out if a dynamic dispatch occurs? What if, my example script on my packagecompiler step wasn’t sufficient enough to totally precompile the libraries. Is there a way to make sure this is happening?

Is it used for shared libraries? Right now, I’m not pulling <julia.h> just the “julia_init.h” and the header file for my functions. Do I need to find where julia.h is?

It needs to be in the binary so that the TLS access model is local/initial exec (All about thread-local storage | MaskRay)

I’m still not understanding how to get the JULIA_DEFINE_FAST_TLS into my main.c while using a shared library. I’ve scoured the documentation and only see it being used if <julia.h> is included when not using shared libraries.


extern "C"
{
#include "julia_init.h"
#include "julia_hfa.h"
}

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{

    size_t alen=8;
    double mag[alen], phase[alen];

    char *C_jld2_filename;
    if (argc<2)
    {
        fprintf(stderr, "Please specify path to jld2 file\n");
        return EXIT_FAILURE;
    }
    else
    {
        C_jld2_filename = argv[1];
    }

    init_julia(argc, argv);

[ The reset of the code deleted for brevity]

I don’t know if this causes more garbage that needs collecting. But I also noticed when I pass --compile=no, to my compiled function Before anything really happens like file data loaded, etc. I guess it may be the References being initialized or something else. (Which doesn’t make sense why they wouldn’t be compiled.)
I tried starting the julia session where I run the PackageCompiler with --compile=all, but I still get these messages.
The program still seems to execute fine though. Is this a problem? IS there something else I should do to get “compile=all” to take hold in the packagecompiler?

code missing for get(Base.IdDict{Module, Base.PkgId}, Any, Any) from get(Base.IdDict{K, V}, Any, Any) where {K, V} : sysimg may not have been built with --compile=all
code missing for getindex(Base.IdDict{Module, Base.PkgId}, Any) from getindex(Base.IdDict{K, V}, Any) where {K, V} : sysimg may not have been built with --compile=all
code missing for _string_n(Int64) from _string_n(Integer) : sysimg may not have been built with --compile=all
code missing for unsafe_wrap(Type{Array{UInt8, 1}}, String) from unsafe_wrap(Type{Array{UInt8, 1}}, String) : sysimg may not have been built with --compile=all
code missing for (::Type{String})(Array{UInt8, 1}) from (::Type{String})(Array{UInt8, 1}) : sysimg may not have been built with --compile=all

Sorry I am not sure what you mean by that? What has using a shared library to do with this?

Your main.c should look like:
https://docs.julialang.org/en/v1/manual/embedding/#High-Level-Embedding

Where is “julia_init.h” and “julia_hfa.h” coming from?

As far as I know julia.h is the only supported public header.

1 Like

julia_init.h comes from PackageCompiler.jl create_library() my c program uses functions gernerated from julia by that using the @ccallable macro.

julia_hfa.h is my header file which exposes all of the @ccallable functions.

So even when I load the system image that is built by the packagecompiler as my julia system image and call the functions with test code similar to what I call in see, I see 0 allocations by all of the functions except if I do something like dump debug to the screen. In the C-code calling the same shared libraries, I turn on garbage collection logging and turn off garbage collection, run my loops. Then turn on garbage collection, I show 448 bytes have been allocated each cycle on average. OR 4.48 GBs.

When one of our engineer runs google’s heap checker, it shows that julia has a memory leak here.

"
I ran google’s heap checker, it shows a memory leak due to this realloc function https://github.com/JuliaLang/julia/blob/v1.9.2/src/support/ios.c#L201
"

I’m wondering if this is because of the unsafe_store I use to avoid allocations by defining a new array linked to the C-defined memory.

1 Like

How do you manage that array linked to C defined memory? When do you free it?

When package compiler runs, it does give us these warnings, which we dutifully ignored. Is it a clue?

/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /mnt/d/linux/julia-1.9.2/lib/julia/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002

The memory doesn’t get deallocated.
The calculations are stored in a Reference that looks like this.

const outputRef_db_mag=Ref{Vector{Float64}}()
const outputRef_deg_phase=Ref{Vector{Float64}}()

Those vectors are allocated outside of the high-rate loop like this.

outputRef_db_mag[]=Vector{Float64}(undef,number)
outputRef_deg_phase[]=Vector{Float64}(undef,number)

The values are assigned during calculation using a map! from another array that is also a const ref. Basically the values are converted into dB and deg.

Then when the C-program asks for the results, it runs this function.

Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
    if isCalculationValid[]
        try
            outlen=length(outputRef_db_mag[])
            if outlen == len
                for ii=1:outlen
                    unsafe_store!(cmag, outputRef_db_mag[][ii],ii)
                    unsafe_store!(cphase, outputRef_deg_phase[][ii],ii)
                end
            else
                @error "Pointer to Array of length $outlen is required! Function called with len=$len."
                return Cint(1)
            end    
        catch err
            @error "Problem copying arrays from Julia to C: $(err.msg)"
            Base.invokelatest(Base.display_error, Base.catch_stack())
            return Cint(2)
        end
        return Cint(0)
    else
        @error "Results need recalculating first."
        return Cint(3)
    end

end

I have some other functions that allow the C-program to do the loop and make the copy. I’m going to try those in a minute and see if it goes away.
Those look like this.

Base.@ccallable function julia_get_db_mag_index(index::Cint)::Cdouble
    if isCalculationValid[]
        try
            return outputRef_db_mag[][index] 
        catch
            @error "Invalid Index: julia_get_db_mag_index($index)"
            #Base.invokelatest(Base.display_error, Base.catch_stack())
            return Cdouble(0.0)
        end
    else
        @error "Results need recalculating first."
        return Cdouble(0.0)
    end
end

Do you see anything obvious?

As I play around, I have a hypothesis that the allocations are coming from the @ccall arguments. I don’t know why they would be heap allocations. But I think the size of the memory that needs garbage collecting at the end may be correlated to how many arguments are in the @ccall. I comment functions out and the ones that do all of the work, but have no arguments in the function call, don’t change the number of allocations. And the ones that have more seem to change it by more… I need a better experiment.

In this example, we have turned off the garbage collection during the loop to help isolate the problem. Yes. I confirmed, I added a call to this function 1000 times in my loop of millions and it blew things up.

Base.@ccallable function julia_set_dB_gain(dB_gain::Cdouble)::Cvoid 
    isCalculationValid[]=false  # Parameter Set, so invalidate current answer
    inputRef[].dB_gain = dB_gain
    return nothing
end

From C

for (int iii = 0; iii<1000; iii++)
        {
            julia_set_dB_gain(1.0);
        }
        

Does this make sense?

That seems very excessive, but might explain my web browser problems…

For me it’s 21 GB (for even just 1 of my 9 currently running julias, I assume similar for the others), which is actually 66% of RAM, for some reason, still excessive, though thereof 15 GB free. I don’t recall what I was doing with it, nothing mush so, 6 GB in use seem excessive.

The limit seems to be 2 exabytes unless constrained, i.e. in practice for all:

static memsize_t max_total_memory = (memsize_t) 2 * 1024 * 1024 * 1024 * 1024 * 1024;

i.e. that value gets adjusted. I think you’re wrong on 90%, reading the code I see “* 0.8”, and also:

if (target_heap > max_total_memory && !thrashing) // Allow it to go over if we are thrashing if we die we die

80% for me would be 25.6 GB (and I’m under even if the 250 MB subtracted, that you didn’t take into account), so why? Likely because of the “!thrashing”. I.e. the computer was thrashing, and limited, but just to slightly smaller or 66% of RAM. Note, it’s overcommited on Linux, may not be a problem if not freed, or is when I really feel the OOM…

[I really look forward to the new revolutionary MemBalancer that was merged, then then PR reverted for 1.10 because of a known bug in the implementation (I think it’s not there in e.g. Google Chrome’s GC). I believe it’s still in, in 1.11, and the bug will be fixed there before release.]

Maybe I should be using the heap-limit CLI option always, or at least with same effect:

This amount may be constrained, e.g., by Linux control groups.

such global setting so I do not forget. Also set such for web browsers, I never good around to look into them, I thought they might crash if I limited their memory… but they probably also take such config into account. Or all such redundant now with the MemBalancer, since it syncs GC across browser, and Julia and all GCs that use the algorithm.

How do I make these pass by reference?