begin
local is_computed = Ref(false)
local result = Ref(0)
function f()
if !is_computed[]
println("Heavy computations.")
result[] = 1
is_computed[] = true
end
return result[]
end
end
This works also without the Refs but runs into the dreaded Issue #15726, which may or may not matter in practice depending on how you use the result.
Has anyone suggested using a closure in the following way:
(I see now both @GunnarFarneback and @sgaure showed nice answers along this line)
julia> function lazify(f)
val::Union{Nothing, Core.Compiler.return_type(f, Tuple{})} = nothing
return () -> begin
if isnothing(val)
val = f()
end
return val
end
end
lazify (generic function with 1 method)
julia> function big_calc()
sleep(1)
return 10
end
big_calc (generic function with 1 method)
julia> @time big_calc()
1.002507 seconds (5 allocations: 144 bytes)
10
julia> lazy_big_calc = lazify(big_calc)
#3 (generic function with 1 method)
julia> @time lazy_big_calc()
1.023378 seconds (4.07 k allocations: 213.197 KiB, 2.06% compilation time)
10
julia> @time lazy_big_calc()
0.000015 seconds
10
lazify(f) returns a function which would do the long calculation the first time and return stored value later.
This works for parameter-less functions.
As a general rule, I find using @eval anywhere except for generated code in compilation a little worrisome.
Thatβs a sound gut reaction. I would recommend the same for undocumented internal functions like Core.Compiler.return_type (which as it happens also is something the Memoize package makes use of).
Because the whole point with this exercise is that itβs not known whether f will be called or not, and it takes a long time, so it should be avoided if not needed.
Agreed. Maybe Core.Compiler.return_type should be relegated to a default value of a Type parameter, so its use will not be hidden and could also be avoided.
Taking your comments into consideration, and adding a sentinel value to speed things up a bit more. Here is a new version:
julia> function sentinel_lazify(sentinel, f, args...; kwargs...)
val = sentinel
return () -> begin
if val === sentinel
val = f(args...; kwargs...)
end
return val
end
end
sentinel_lazify (generic function with 1 method)
julia> function big_calc_arg(x; double = false)
sleep(1)
return ifelse(double, 2x, x)
end
big_calc_arg (generic function with 1 method)
julia> lazy_big_calc_arg = sentinel_lazify(NaN, big_calc_arg, 5.0; double = true)
#4 (generic function with 1 method)
julia> @time lazy_big_calc_arg()
1.021969 seconds (17.83 k allocations: 1015.760 KiB, 1.90% compilation time)
10.0
julia> @time lazy_big_calc_arg()
0.000007 seconds
10.0
It also allows taking arguments and keyword arguments which should be important for Big calculations.
@ArthurW linked to such an example in the standard library in a former version of one of his comments, then it edited it away since he didnβt need the let statement there.
For reference, hereβs a tidier MWE of the approach:
let isfirstcall = true, output = 0
global function f()
isfirstcall || return output
isfirstcall = false
println("heavy computation")
output = 42
return output
end
end
(I made the previous MWE slightly more concise).
I found this 2017 JuliaCon talk by Jameson Nash that addresses memoization, showing a more general example to memoize any input, and also a macro that looks like what Memoize.jl is essentially doing.
Unless at some point the discussion will show a different majority consensus, Iβd summarize the solution for other users stumbling upon this thread:
For general memoization it is recommended to check out Memoize.jl.
For lazy construction of an object, this thread illustrates different approaches.
julia> function f_eval()
output = 42
@eval f_eval() = $output
return output
end
f_eval (generic function with 1 method)
julia> f_eval()
42
julia> @benchmark f_eval()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min β¦ max): 0.809 ns β¦ 3.450 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 0.831 ns β GC (median): 0.00%
Time (mean Β± Ο): 0.831 ns Β± 0.062 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ ββ
βββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββ β
0.809 ns Histogram: frequency by time 0.849 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
EDIT: In a previous version of this post I was using an older version of @ArthurWβs @eval trick, which was still defining a global; I updated the post with the last version.
I could not avoid them. Have data that needs to be loaded and processed on module start and then the various module functions will use that data, some make output that other functions may need later.
The code runs for days/weeks/months, usually just waiting, and every hour or so checks if something new happened and then does things.
The alternative is passing all this data as some gigantic dictionary. Or cache it on disk, but since Julia does not guarantee saving dictionaries to binary file can be loaded again, not safe. (+unnecessary here) (I really wish julia had a safe way to story binary versions of data structures like R and Python)
I would love to do this without globals, but could not figure it out.
P.s. we used to run this code with cron, which avoids the globals, when it was python, but the startup time is so large that this replicating cron in julia is better. hence need for globals.
OOps, the last time I checked Serialization Β· The Julia Language warned format would change, but now they have removed the warning, so clearly its stable. Mea Culpa.
yeah, this works, and I have done that. but if one has several mybigcalc() type functions, they may depend on output from other mybigcalc and a number of functions depend on those calculations, (in our case reporting, feeding dashboard, and sending a sms to admin if urgent error) it becomes doable, but messy.
the globals are anyways global to the module only, not visible outside it.
in python classes one just has self.data which is OK there. c++ has similar.
so, if globals are not to be used, need to be able to share data between many functions somehow.
Either one struct that youβre passing to all the various functions or depending on if youβre using maybe multiple threads you might use a channel to send information
Yes, we were passing a struct with all relevant data, it makes code more complicated, always receive and pass the same struct. Why just not make it global and simplify code.
All is linear, so no threads in this app (diff on our HPC apps).
But to be honest, I donβt get the dislike of globals. Yes, its horrible to dump all variable into the same namespace, and I value strict scope. I love writing in c for(int i ; , saved me many times. (need to see if julia can do that)
But for sharing data, having one stuct, that can even be called THE_GLOBAL_DATA_STRUCT, why is that wrong?
You might think it βsimplifiesβ code, but it can absolutely make code harder to understand. At random places throughout the code, the state can change, and you can therefore never know exactly whatβs going to happen at any given point. For example:
if GLOBAL[1] == 3
something()
assert(GLOBAL[1] == 3) # wrong... something() changed it
else
somethingelse()
end
Also accessing globals pretty much ensures that youβll never effectively multithread your code.
Finally, accessing globals is slow in Julia, so it can slow down your entire code base by factors of like 10 easily.