I’m not sure whether the object created in a function persists (not GC’ed) when it is referred to in a closure:
function ref_i(ref)
function inner(i)
ref[i] * 10
end
return inner
end
function get_func(r)
a = [r,2r,3r,4r] # <--
func = ref_i(a)
return func
end
f = get_func(pi)
g = get_func(-1.5)
@show f(3)
@show g(3)
Will the Vector objects created in get_func() stay available from f() and g() ? The above code appears to be working as intended, but if the Vectors are GC’ed, it will stop working . . . I would think.
The above code is of course a toy program. In my real problem, the object I want to stay is a file handler (fh = NCDataset( . . . )). Because the file can be big, I don’t want to read all data into memory at once. So, I want a function that read a portion of the data, process it, and return the result.
But, at some point during the execution of my program, the file seems to be closed and the file handler seems to become invalid. I’m trying to fix this problem.
You pass a as the arg of ref_i(a), later captured by inner and returned to func, so becomes f. So the name f still has connection with the a you created.
As long as the name f is not re-associated to an other object, the Vector object a will not be GC-ed (I think).
Here is a minimal, self-contained example that fails:
#--- file tmp_Debug.jl ---
module tmp_Debug
using NCDatasets
function ref_i(ref)
@show ref["lat"][:] # -> Values are printed.
function inner(i)
@show ref # -> "Closed dataset".
return ref["lat"][i]
end
return inner
end
function get_func(fnam)
a = NCDataset(fnam, "r")
return ref_i(a)
end
const f = get_func("http://apdrc.soest.hawaii.edu:80/dods/public_data/WOA/WOA18/5_deg/annual/temp")
println("In the module: f(1) = $(f(1))")
end
# --- another file try-debug.jl ---
push!(LOAD_PATH,pwd())
using tmp_Debug
val = tmp_Debug.f(1)
When function f() is called outside the module, the file handle ref says “closed Dataset”. I guess this is the message the finalizer of the ref object left behind.
When function f() is called inside the module, it works as intended.
ref having a usable object at all proved the file handler was not GCed, just closed in some way. I evaluated the module directly and ran using .tmp_Debug instead, and it worked fine. The most stark difference there is that I didn’t precompile the module, while your use of LOAD_PATH did.
Precompilation is a package-wise AOT compilation; what you execute in the module actually happens then and gets saved in precompile files to be loaded later at runtime. A file handle can’t remain open perpetually just because of a precompile file, obviously. From the docs:
Other known potential failure scenarios include:
…
3. Depending on compile-time side-effects persisting through load-time. Example include: modifying arrays or other variables in other Julia modules; maintaining handles to open files or devices; storing pointers to other system resources (including memory);
Precompilation can’t detect side-effects or runtime-initialize the file handler for you, you have to assign f in __init__…somehow. global const wouldn’t work, and although eval into the same module is nominally allowed, it has problems when triggered by being imported by another package being precompiled. A plain global f = ... would work, but you were probably trying to avoid that performance problem. Could help to annotate with ::NCDatasets.NCDataset{Nothing, Missing}, but that still has overhead compared to const.
Thank you for elucidating the problem!!! But, whose bug is this?
“Some way” . . . is the mystery. If it’s not the finalizer of the file-handle object, what closed the file?
Is modifying LOAD_PATH allowed to alter the semantics of the program? Or, does the language specification allow that the state of the object captured by const f to be destroyed by LOAD_PATH?
I thought precompilation is just an optimization, which shouldn’t alter the semantics of the program.
Is the document you quote discussing a known “bug”?
The limits of precompilation are documented, not a bug, and it’s rooted in fundamental limitations of what can be cached AOT. Just look at the still-evolving specification of C++'s constexpr, we have much easier lines to deal with.
You’re thinking of method call compilation, not module precompilation, which does a lot more than just optimization. It’s not really altering the language semantics, you’re just losing features to AOT limitations, whether it’s a thrown error or undefined behavior.
I don’t know either. For all I know, maybe the GC and thus the finalizer does run during the precompilation process after the object is cached. Or maybe just loading the object cached without its external state is enough to manifest a “closed” object. I’m guessing this could be found out if we removed the finalizer from NCDataset.
LOAD_PATH isn’t the direct reason for this, it’s precompilation. You could disable precompilation to evaluate the module at runtime from scratch and the issue would disappear.
It’s also not recommended anymore to do implicit environments or modify LOAD_PATH like this. There are still reasons for modifying LOAD_PATH, but that should be an edge case, not routine. If you want to work with precompilation, make an explicit package and dev/add it to an explicit environment with a project file. You’re also free to just make a module that you’ll only ever evaluate at runtime (manually include the file), never to be precompiled and distributed as part of any package.
The fact that it’s documented doesn’t solve this problem in practice. What is the resolution of this problem? If I understand what you are saying correctly:
When your program depends on an object that doesn’t survive precompilation, you should disable precompilation on a per-module basis.
If this is the final resolution of the issue,
The programmer needs to know whether the object returned by a function will survive precompilation or not.
We need a switch to disable precompilation on a per-module basis.
How do you achieve 1?
How do you switch off precompilation on the module side? As you explain, I could use include on the “using” side, but the decision to disable precompilation should be on the module side because it’s an implementation detail.
Who should do what, in practice, to avoid a problem like the one I encountered? Should the designer of the NCDatasets package do something differently?
That is not among the things I said. I actually think that’s (specifically, __precompile__(false)) generally a terrible idea because 1) precompilation can save time even if your code doesn’t serve as a dependency, and 2) it obstructs your code from being in dependencies for the generally precompilable packages. I said you should try to use __init__ to initialize things at runtime that fundamentally need to be, which doesn’t obstruct precompilation. AFAIK __precompile__(false) is only justified if you intend a module to be strictly evaluated interactively with include (not a package), and in that case, I wouldn’t even bother because storing the files in a clearly labeled scripts folder is enough. Note that by modifying LOAD_PATH (again, not recommended compared to dev/add to explicit environments), you’re trying to treat the module as a package, so it’s not being evaluated interactively.
Note that runtime initialization isn’t just about objects surviving, it’s about anything you need to do at runtime. For example, if you need a package to generate a random Int every session, doing it in the global scope will fail because it only runs during precompilation. The Int survives the AOT cache just fine, you’ll just keep getting the same cached value every session.
The docs explain most of the principles, but I don’t think it’s comprehensive or obvious. Sometimes people really just find out by running into objects that don’t work like this.
Before proceeding . . . our premise should be: The same module should work whether it’s precompiled or not, because precompilation is purely an optimization and shouldn’t change the semantics of the program.
From this premise, we can conclude that we should be writing:
module MyMod
# . . .
# const f = get_func() # May not work.
f = nothing
function __init__()
global f
f = get_func()
end
if the object returned by get_func() may not survive precompilation.
So, this is the resolution of the problem.
Here, whether the module is treated as a package or just to be included is irrelevant, because the behavior of the module shouldn’t change between the two cases.
By the way, I have the impression that the distinction between compile time from run time was initially vague in the history of Julia, because the behavior (except for performance) of the program didn’t change whenever compilation happened. But, by the introduction of precompilation, one needs to be aware.