How to make a function store data to avoid repeating computation

I have a function in a module that performs some heavy computation to produce a fixed output.
It is hard to predict if the function will be called, and we want to avoid computing it if unnecessary.
So, we would like the function to be lazy and to store its output the first time it is called.
Here’s one way to do that:

function f() 
     isdefined(@__MODULE__, :_output) && return _output 
     global _output = ... heavy calculations ... 
     return _output 
 end

Is it the best and/or most julionic way to do that?

1 Like

There is a package called Memoize (link: Memoize.jl) which does this with simple syntax.

Reusing other packages is considered Julianic.

6 Likes

I would define a global const Output as a struct that is partially initialized at startup.
When f is called, it can check whether the desired field of Output has been initialized and compute it if needed.

Making Output a functor may be a good implementation (so you can just call Output() to retrieve or calculate the desired data as needed). I have not thought through implementation details for this.

I tried something like this, here is my MWE:

julia> mutable struct b
       value
       b() = new()
       end

julia> a = b()
b(#undef)

julia> function (f::b)()
             if !isdefined(f,:value)
                 f.value=42
             end
             f.value
       end

julia> a()
42

Unfortunately, !isdefined(f,:value) always returns true: the functor doesn’t change the struct :confused:

I use functors. For instance for transformations where I need a buffer in order to avoid allocations
in inner loops.

Would this help?

2 Likes

You could redefine f. It looks a bit better and can be seen as a “Julia thing”.

julia> function f()
           global _output = (println("heavy calculations"); 12)
           @eval f() = _output
           return _output
       end
f (generic function with 1 method)

julia> f()
heavy calculations
12

julia> f()
12

More generally, it seems you want to emulate the behaviour of local static variables from C/C++. There are multiple requests for such feature (see this issue).

2 Likes

I got some suggestions regarding using structs. Does anybody have a MWE?

In the meantime, would making the side effect more explicit be good?
Here’s my proposal:

julia> _f_cache::Union{Nothing,Int} = nothing

julia> function f!(_output = _f_cache)
            _output != nothing && return _output
            global _output = ... heavy calculations ...
            return _output
        end

Have you tried Memoize.jl, as previously suggested?

julia> @memoize f() = sum(rand(1024, 1024)^2)
f (generic function with 1 method)

julia> @time f()
  0.014725 seconds (6 allocations: 16.000 MiB)
2.683692441081925e8

julia> @time f()
  0.000001 seconds
2.683692441081925e8
3 Likes

To fix my MWE on using a functor, I just needed to add the global keyword:

julia> mutable struct b
       value
       b() = new()
       end

julia> a = b()
b(#undef)

julia> function (f::b)()
             if !isdefined(f,:value)
                 global f.value=42
             end
             f.value
       end

julia> a()
42

I think that this is the cleanest approach.

1 Like

Yes!
But I’m curious about achieving it without external packages, since one might want to avoid adding a dependency if the problem has to be solved only in few isolated cases.

I understand the reluctance to add unnecessary dependencies, but Memoize is pretty lightweight:

julia> @time_imports using Memoize
     19.7 ms  MacroTools
      0.2 ms  Memoize
2 Likes

I think globals are bad. Is there even one example of code that requires a global?

4 Likes

It may be worth mentioning that it’s possible to adapt the redefinition strategy to avoid a global variable.

julia> function f()
           output = (println("heavy calculations"); 12)
           @eval f() = $output
           return output
       end
f (generic function with 1 method)

julia> f()
heavy calculations
12

julia> f()
12
2 Likes

I think this is a mistake. Memoize does exactly what you want and is debugged and can be applied to whatever you want and works with functions that have multiple arguments etc etc. It’s lightweight and should be considered kind of a core package.

3 Likes

Are you sure? It is working for me:

julia> mutable struct B                                                  
           value                                                         
           B() = new()                                                   
       end                                                               
                                                                         
julia>                                                                   
                                                                         
julia> b = B()                                                           
B(#undef)                                                                
                                                                         
julia>                                                                   
                                                                         
julia> function (b::B)()                                                 
           if !isdefined(b,:value)                                       
               b.value=42                                                
           end                                                           
           b.value                                                       
       end                                                               
                                                                         
julia>                                                                   
                                                                         
julia> @show b()                                                         
b() = 42                                                                 
42                                                                       
                                                                         
julia>                                                                   
                                                                         
julia> b.value = 1                                                       
1                                                                        
                                                                         
julia>                                                                   
                                                                         
julia> @show b()                                                         
b() = 1                                                                  
1                                                                        
                                                                         
julia> 
1 Like

I agree, but in our use case the functions that we want to memoize have no arguments: their aim is to build a fixed object once and for all, if called.
I confirm also, @PetrKryslUCSD that my favorite solution works, but people argued in private conversations that the struct approach is too verbose.
I let the thread unsolved atm, so that we can try to achieve some consensus.

To get some practice, I made a macro for this functor solution. It also supports return type specification on the functions. This might improve performance.

using MacroTools

macro oncefun(ex)

  @capture(shortdef(ex), (fname_() = body_) | (fname_()::T_ = body_)) ||
    error("Need function definition without arguments")

  isnothing(T) && (T = Any)
  sname = gensym()
  ex = quote
    mutable struct $sname
      value::Union{$T,Missing}
      $sname() = new(missing)
    end
    @inline function (ms::$sname)()
      if ismissing(ms.value)
        ms.value = $body
      end
      ms.value::$T
    end
    const $fname = $sname()
  end
  esc(ex)
end




@oncefun f() = (println("run"); 12)

@oncefun function g()::Int
  println("run")
  13
end
2 Likes

I think this is the issue. Memoize.jl is a great package and would be the clear choice if the point was to do memoization (with more instances, multiple arguments, etc). However, in this case the goal seems to be much simpler: to delay the initialization of a variable until its first use. Achieving this through @memoize feels like overkill, as it appears to build an entire (global) dictionary and perform a search everytime you want to access the “lazy variable”. While this should still be “fast enough”, this juggling is sure to confuse the compiler, preventing the use of the “lazy variable” in critical loops (I also suspect it would be a nightmare for compilation to GPUs). More concretly, the @memoize approach is very likely to prevent autovectorization.

I believe the best solution would be to have static variables, but, until then, I can see simple cases benefitting from the redefinition trick I proposed. It’s effectively a single extra line and subsequent accesses to the variable should be zero-cost.

In this simple benchmark, the redefinition is something like 7 orders of magnitude faster than memoization:

using Memoize
using BenchmarkTools

function f()
    val = (println("... heavy calculations ..."); 12)
    @eval f() = $val
    return val
end
@show f()
@show f()

@memoize g() = (println("... heavy calculations ..."); 13)
@show g()
@show g()

function usef()
    s = 0
    for i in 1:1_000_000
        s += i * f() - f()
    end
    return s
end

function useg()
    s = 0
    for i in 1:1_000_000
        s += i * g() - g()
    end
    return s
end

@btime usef()
@btime useg()

Output:

... heavy calculations ...
f() = 12
f() = 12
... heavy calculations ...
g() = 13
g() = 13
  1.490 ns (0 allocations: 0 bytes)
  16.334 ms (0 allocations: 0 bytes)

This is due to the compiler confusion I mentioned. The compiler understands what’s going on with usef() and optimize everythig away. See the native code:

        .text
        endbr64
        movabsq $5999994000000, %rax            # imm = 0x574FB82D280
        retq
        nop

While for useg() we get the monstrosity below.

        .text
        endbr64
        pushq   %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $72, %rsp
        movabsq $139940655471280, %rcx          # imm = 0x7F46790F1AB0
        vxorps  %xmm0, %xmm0, %xmm0
        vmovaps %xmm0, 32(%rsp)
        movq    $0, 48(%rsp)
        movq    %fs:0, %rax
        movq    -8(%rax), %rdx
        movq    $4, 32(%rsp)
        movq    (%rdx), %rax
        movq    %rax, 40(%rsp)
        leaq    32(%rsp), %rax
        movq    %rdx, 64(%rsp)
        movq    %rax, (%rdx)
        movl    $1, %ebp
        xorl    %ebx, %ebx
        leaq    1787265262(%rcx), %r13
        movabsq $jl_system_image_data, %r14
        movabsq $139942131169320, %r15          # imm = 0x7F46D1047828
        nopl    (%rax)
L128:
        movabsq $139940655471280, %rax          # imm = 0x7F46790F1AB0
        movq    (%rax), %rdi
        movq    %rdi, 48(%rsp)
        movq    %r14, %rsi
        movq    %r15, %rdx
        callq   *%r13
        cmpq    %r15, %rax
        jne     L277
        movabsq $139940626210368, %rax          # imm = 0x7F4677509E40
        movq    %rax, 8(%rsp)
        movabsq $jl_system_image_data, %rdi
        leaq    8(%rsp), %rsi
        movl    $1, %edx
        movabsq $139941898452352, %rax          # imm = 0x7F46C3257D80
        callq   *%rax
        movabsq $139940655471280, %rax          # imm = 0x7F46790F1AB0
        movq    %rax, 8(%rsp)
        movabsq $139942130905952, %r12          # imm = 0x7F46D1007360
        movq    %r12, 16(%rsp)
        movq    %r14, 24(%rsp)
        movabsq $jl_system_image_data, %rdi
        leaq    8(%rsp), %rsi
        movl    $3, %edx
        movabsq $139941899165808, %rax          # imm = 0x7F46C3306070
        callq   *%rax
        movq    %r12, %rax
L277:
        movq    -8(%rax), %rcx
        shrq    $4, %rcx
        movabsq $8746369302984, %rdx            # imm = 0x7F46C3C41C8
        cmpq    %rdx, %rcx
        jne     L570
        movq    (%rax), %r12
        imulq   %rbp, %r12
        movabsq $139940655471280, %rax          # imm = 0x7F46790F1AB0
        movq    (%rax), %rdi
        movq    %rdi, 48(%rsp)
        movq    %r14, %rsi
        movq    %r15, %rdx
        callq   *%r13
        cmpq    %r15, %rax
        jne     L494
        movabsq $139940626210368, %rax          # imm = 0x7F4677509E40
        movq    %rax, 8(%rsp)
        movabsq $jl_system_image_data, %rdi
        movq    %r13, %r15
        movq    %r14, %r13
        leaq    8(%rsp), %r14
        movq    %r14, %rsi
        movl    $1, %edx
        movabsq $139941898452352, %rax          # imm = 0x7F46C3257D80
        callq   *%rax
        movabsq $139940655471280, %rax          # imm = 0x7F46790F1AB0
        movq    %rax, 8(%rsp)
        movabsq $139942130905952, %rax          # imm = 0x7F46D1007360
        movq    %rax, 16(%rsp)
        movq    %r13, 24(%rsp)
        movabsq $jl_system_image_data, %rdi
        movq    %r14, %rsi
        movq    %r13, %r14
        movq    %r15, %r13
        movabsq $139942131169320, %r15          # imm = 0x7F46D1047828
        movl    $3, %edx
        movabsq $139941899165808, %rax          # imm = 0x7F46C3306070
        callq   *%rax
        movabsq $139942130905952, %rax          # imm = 0x7F46D1007360
L494:
        movq    -8(%rax), %rcx
        shrq    $4, %rcx
        movabsq $8746369302984, %rdx            # imm = 0x7F46C3C41C8
        cmpq    %rdx, %rcx
        jne     L605
        addq    %r12, %rbx
        subq    (%rax), %rbx
        incq    %rbp
        cmpq    $1000001, %rbp                  # imm = 0xF4241
        jne     L128
        movq    40(%rsp), %rax
        movq    64(%rsp), %rcx
        movq    %rax, (%rcx)
        movq    %rbx, %rax
        addq    $72, %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq
L570:
        movabsq $.rodata.str1.1, %rdi
        movabsq $jl_type_error, %rcx
        movabsq $jl_system_image_data, %rsi
        movq    %rax, %rdx
        callq   *%rcx
L605:
        movabsq $.rodata.str1.1, %rdi
        movabsq $jl_type_error, %rcx
        movabsq $jl_system_image_data, %rsi
        movq    %rax, %rdx
        callq   *%rcx
1 Like

The eval method only works if you call f() before you call usef(). Otherwise the newly eval-defined f is in a new world age and will not be seen by usef(). Resulting in heavy calculations twice in each loop iteration.

That’s true.