Non-Deterministic Memory Allocation?

After running a short script several times without making changes (using include("script.jl"); from the REPL), I noticed that the number and total size of memory allocations was not constant. How (and why?) can this happen? I’ve attached the script and several samples of output for reference.

startup.jl:

module StartupUtils
export @showc

using Revise;
using LinearAlgebra;
using Plots;

# Compact show
macro showc(exs...)
    blk = Expr(:block)
    for ex in exs
        push!(blk.args, :(println($(sprint(Base.show_unquoted,ex)*" = "),
        repr(begin local value = $(esc(ex)) end, context = :compact=>true))))
    end
    isempty(exs) || push!(blk.args, :value)
    return blk
end

end

using .StartupUtils;

script.jl:

module HW6
using Revise

using .Main: @showc

function T1(θ)
    return [
        1       0      0
        0  cos(θ) sin(θ)
        0 -sin(θ) cos(θ)
    ]
end

function T2(θ)
    return [
        cos(θ) 0 -sin(θ)
        0      1       0
        sin(θ) 0  cos(θ)
    ]
end

const IcB = [
    34  1  6
     1 15  3
     6  3 10
]  # kg m^2

const ψ = deg2rad(5)
const θ = deg2rad(10)
const ϕ = deg2rad(-3)

const ψdot = deg2rad(-1)
const θdot = deg2rad(1)
const ϕdot = deg2rad(4)

const ω = T1(ϕ) * T2(θ) * [0, 0, ψdot] .+ T1(ϕ) * [0, θdot, 0] .+ [ϕdot, 0, 0]

function main()
    T_rot = 1/2 * ω' * IcB * ω
    @showc T_rot
end

@time "Total" main()

end

Samples of output:

WARNING: replacing module HW6.
T_rot = 0.0873849
Total: 0.000079 seconds (19 allocations: 960 bytes)
WARNING: replacing module HW6.
T_rot = 0.0873849
Total: 0.000097 seconds (21 allocations: 1.797 KiB)
WARNING: replacing module HW6.
T_rot = 0.0873849
Total: 0.000050 seconds (19 allocations: 960 bytes)
1 Like

This looks like you’re reincluding the file over and over again? In which case, it’s very possible that some runs include GC time where things that weren’t cleaned up before are cleaned up now.

I’ve moved this topic to General Usage, since this is not really a question about the internals of the language or runtime.

1 Like

Memory allocations does fluctuate for complex function calls, and IMO it’s a good question about internals. Here is the histogram for [@allocated Pkg.status() for _ in 1:50]

Which operation causes this variation? GC.enable(false) doesn’t change anything.

2 Likes

Putting everything into a module that gets reloaded each run is the only reliable way I’ve found to achieve deterministic program output in Julia. Otherwise, you get random crap based on whatever you last did at the REPL, since there’s no way to clear the workspace!

As a side note, if I tried to recommend Julia in its current state to a bunch of MATLAB junkies, I’d get laughed out of the room as soon as they ask about whether or not it’s possible to do something as trivial as clearing the workspace. I’m sure it’s a difficult problem or whatever (just like Rust’s inability to perform compile-time arithmetic on type parameters), but in both cases these are basic features that you really start to miss when they’re taken away.

Regardless, the call to @time is only being given the main function, so I don’t understand why it would be tracking allocations unrelated to main.

Do you know about BenchmarkTools.jl? Most Julia users use that package for benchmarking. It takes care of all the finicky precompilation, statistic collection & aggregation and, yes, also managing the GC to make the results as deterministic as possible.

There are a number of reasons why allocations or timing of a benchmark can fluctuate; first of which compilation time, how much your system is otherwise loaded, how much GC was loaded with previous values…

@time isn’t tracking anything actively. It’s an extremely naive macro, literally only reporting the differences in time & GC statistics during its invocation:

julia> @macroexpand @time 1+1
quote
    #= timing.jl:272 =#
    begin
        #= timing.jl:277 =#
        $(Expr(:meta, :force_compile))
        #= timing.jl:278 =#
        local var"#1#stats" = Base.gc_num()
        #= timing.jl:279 =#
        local var"#3#elapsedtime" = Base.time_ns()
        #= timing.jl:280 =#
        Base.cumulative_compile_timing(true)
        #= timing.jl:281 =#
        local var"#4#compile_elapsedtimes" = Base.cumulative_compile_time_ns()
        #= timing.jl:282 =#
        local var"#2#val" = $(Expr(:tryfinally, :(1 + 1), quote
    var"#3#elapsedtime" = Base.time_ns() - var"#3#elapsedtime"
    #= timing.jl:284 =#
    Base.cumulative_compile_timing(false)
    #= timing.jl:285 =#
    var"#4#compile_elapsedtimes" = Base.cumulative_compile_time_ns() .- var"#4#compile_elapsedtimes"
end))
        #= timing.jl:287 =#
        local var"#5#diff" = Base.GC_Diff(Base.gc_num(), var"#1#stats")
        #= timing.jl:288 =#
        local var"#6#_msg" = Base.nothing
        #= timing.jl:289 =#
        Base.time_print(Base.stdout, var"#3#elapsedtime", (var"#5#diff").allocd, (var"#5#diff").total_time, Base.gc_alloc_count(var"#5#diff"), Base.first(var"#4#compile_elapsedtimes"), Base.last(var"#4#compile_elapsedtimes"), true; msg = var"#6#_msg")
        #= timing.jl:290 =#
        var"#2#val"
    end
end

So whatever bookkeeping the GC happens to do during a GC run that occurs in the code your @time invocation encompasses is recorded too.

That’s also why its docstring recommends BenchmarkTools.jl for bechmarking:

help?> @time
  @time expr
  @time "description" expr

[...]

  │ Note
  │
  │  For more serious benchmarking, consider the @btime macro from the BenchmarkTools.jl package which among other
  │  things evaluates the function multiple times in order to reduce noise.
3 Likes

MATLAB’s clear/clearvars is made feasible by MATLAB’s atypical workspace system of storing variables. This links to a somewhat old answer, but it’s a more detailed description of workspaces and their unique caveats.

This just doesn’t translate to languages with modules holding their own global variables, so this feature will never be fully replicated, if at all. Let’s take a simple case where you fire up the REPL and work in Main. After polluting the namespace with a bunch of variables and data, you decide to remove them. First problem: you want to keep the already compiled functions; okay, we’ll keep any variable assigned to an instance of Function. Second problem, you don’t want to remove const variables because they were inlined into your compiled functions and can’t be safely reassigned anymore; fine, let’s keep them too. Third problem, the functions relying on non-const global variables now don’t work, and you have to manually redefine them from memory. And now you realize that MATLAB risks easier naming conflicts by throwing global variables into their own workspace so that clear ... doesn’t easily cause such issues (though clear global ... and clear all do remove things from the global workspace and is discouraged even in clear’s documentation).

In most languages, the repeatable unit of code is a function, and that should be used instead of tweaking and rerunning scripts. This practice is recommended in MATLAB as well by the real junkies.

6 Likes

The last link you provided explains it all:

Advanced MATLAB users tend to write functions, not scripts.

The average MATLAB junkie doesn’t. The average MATLAB junkie has no idea what you’re talking about.

As a side note, I have tried working from the REPL several times. It’s never as good as running a script. I prefer deterministic programs – not ones that assume I’m going to type the same exact sequence of functions in perfectly every time. I seem to recall some program type that was really good for executing sequences of functions exactly the same each time…

Regardless, I’ve come up with a nuclear option that prevents any and all shenanigans. Not only does it wrap everything in a function as you recommend, it wraps those functions in a module that gets reloaded every time I source the file! Pretty sweet for when I would like to write deterministic software (which is always):

module ModuleName
using Revise

using .Main: @showp, @showc, nearly_equal

# Const variables go here

function computation()
    # Computations go here
    # Return a named tuple containing the results e.g.
    return (a=a, b=b)
end

function plotting(results)
    # Plotting goes here
    # Plot members of the named tuple e.g.
    plot(results.a, results.b)
end

function main()
    results = @time "Computation" computation()
    @time "Plotting" plotting(results)
end

@time "Total" main()

end

(@showp, @showc, and nearly_equal are macros/functions that I define in my startup.jl)

Your way of using Modules does not mesh with Revise:

  • Revise.jl doesn’t go into the module you want to develop, instead it should be used from the REPL.
  • You should not put a function call like main() inside a module and then re-include that module for the sake of running main.

One way to correctly use Revise.jl is:

# put this into a file `main.jl`
module ModuleName

using .Main: @showp, @showc, nearly_equal

# Const variables go here

function computation()
    # Computations go here
    # Return a named tuple containing the results e.g.
    return (a=a, b=b)
end

function plotting(results)
    # Plotting goes here
    # Plot members of the named tuple e.g.
    plot(results.a, results.b)
end

function main()
    results = @time "Computation" computation()
    @time "Plotting" plotting(results)
end

end

Then fire up a REPL and run

julia> using Revise

julia> includet("main.jl") # you do this once in a fresh REPL session; note the 't' at the end of includet!

julia> ModuleName.main()

Then edit any functions inside main.jl, go back to the REPL and run ModuleName.main(), rinse and repeat.

This should be deterministic enough!

See the Revise.jl docs for more infos: Home · Revise.jl

3 Likes

I’d also note that if you wrap everything in functions, like you did there, wrapping everything in the module is not necessary. Using Revise, includet and just calling the functions with appropriate input is enough.

2 Likes

What if I am editing a function in another module that I’m using? There have been multiple instances where I’ve edited a function in another module only for the code to call the old definition of the function because I’ve changed the call signature or the types within the call signature.

It stopped happening when I started wrapping my scripts in a module.

Is there any reason not to do it the way that I’ve done it? I don’t really care about how it should be done; I just care that it works.

Reincluding modules gives a WARNING and those are usually added for a reason. But I don’t know what problems that can cause.

Also: using Revise in your script will have not effect, you can essentially uninstall it.

1 Like

I see. Thanks for the help!

Could I put using Revise in my startup.jl to avoid typing it every time I fire up a REPL?

3 Likes

There is nothing wrong in using a module, but then use revise like:

julia> using Revise # better be in startup.jl

julia> includet("file_that_contains_module.jl")

julia> using .MyModule

julia> MyModule.f() # run module function

then, if you modify the file that contains the module, and the definition of function f, it will be automatically updated. Do not include the file twice.

1 Like

If you really want to reset everything also you can call exec (in Linux at least).

2 Likes

Revise in your startup is a very good idea. (I also would recommend adding BenchmarkTools and Test).

2 Likes

Your const variables aren’t wrapped in a function, but if they have a fixed value for a reason like physical constants so you’ll never have to manually change and remember them to understand the program’s behavior, that would be fine.

It throws away a whole global scope and requires recompilation for relatively minor changes. That’s not to say it’s always a mistake, in fact Pluto.jl does this to implement its live-coding environment. However, it uses up a lot of memory, and it is overkill for resolving the issues at hand:

Again, the typical solution to this is to 1) turn as many global variables into local variables in functions as possible, and 2) feed data into functions with arguments, not by accessing declared global variables. Local variables can’t pollute the global scope and are destroyed at the end of a local scope, like a let block or a function call; same deal with function workspaces in MATLAB, as the old Mathworks Support answer says. Inputting data as arguments lets you keep the behavior of a method consistent and under your control rather than at the mercy of some global variables whose values you can’t quite keep track of. The Mathworks answer may have called this a practice of “advanced users” but it’s actually fairly basic for structured programming and doesn’t involve any fancy memory schemes like module replacement.

If you really don’t feel like refactoring and want to stick with scripts that do a lot in the global scope, another way to avoid global scope pollution is to just restart the process each time. Instead of repeating include("script.jl") in 1 REPL session, you run a julia script.jl in a command prompt. Again, it’s inefficient, and in more ways than throwing away modules, but diverging from good practices isn’t ever going to be convenient.

Heap allocation performance depends on heap state, and allocations can happen without your explicit intention depending on hidden state e.g. vectors reallocating for resizing. Refactoring code as function code would help as you’re not doing as drastic changes, but sometimes it’s not possible to micromanage your code to the point that the same number of allocations happen every time. Even if you do, garbage collection is out of your control and affects runtimes anyway.

1 Like

Good catch! I’ll have to fix this – I was using const variables for things that could change as I adjust my algorithm, assuming that it might be more efficient if that lets the compiler know they can go directly into the executable. These should definitely be in main.