In RigidBodyDynamics.jl, I was able to reduce allocations to zero in a few of the key algorithms. This took quite some effort, mostly because I started out just getting things to work without paying a lot of attention to allocation, after which it was not easy to find exactly what code was allocating. I suppose this is in part due to the fact that track-allocation={user,all} does not report line numbers accurately (e.g. https://github.com/JuliaLang/julia/issues/11753, https://github.com/JuliaLang/julia/issues/19502), especially in inlined code with 0.5.
Issues with track-allocation aside, it would have been very useful for me if there were a way to tell Julia to just throw an error whenever a dynamic allocation is about to be performed. That way you would just directly get a stack trace instead of having to dig through .mem files. The Eigen C++ library has an EIGEN_RUNTIME_NO_MALLOC preprocessor directive and a set_is_malloc_allowed(bool) function that implement this kind of functionality (bottom of page). Of course, Eigen has a much narrower scope than Julia. But if this kind of thing is at all feasible in Julia, I think it would be incredibly useful. Any thoughts on this?
I am also just finishing an iteration of going through a code base I created (still in ignorance) and trying to remove all (still only able to remove most of the big allocations, since I still need more advice, but that is a separate topic) of the temporary variables and dynamic allocations.
I agree the track-allocations problems does make it difficult. Hoping that 0.6 will resolve all of that and be released soon.
I think this kind of concept would be useful. I would propose it not be at a global level, but maybe something like a macro protecting a code block:
@noallocations begin
sensitive .= code .+ here
end
That way you can easily focus on one part of the code at a time (i.e the bottle-neck functions) and later if required, go to more peripheral code.
Alternatively or maybe an additional feature to help with this, is there maybe a way to redirect allocations occurring in that block to be on the stack/automatic variable, hopefully thus saving on GC efforts.
Is there anyone that can comment on feasibility? It seems it should be doable given the existence of track-allocations, but I don’t know enough about the lower level code of Julia.
Yes, this is feasible to do. I know the GPU folks care about this as well. Maybe there’s some code that can be shared. cc @maleadt@vchuravy. I think this macro should be a no-op by default though. The reason for this is that different versions of julia may be able to optimize memory allocations differently, so a macro like this would be extremely fragile. A command line switch to enable it for testing, while allowing it to at least still work with not specifically tested julia versions seems like the right trade off.
I would be really excited to see something like this as well. I’m also often digging through trying to get rid of pesky small allocations and this sort of thing would be great. +1.
Tim and I spoke about this briefly yesterday. For the GPU compiler we added the low level hooks to make this at least possible, but there is currently no way of doing this selectively on a function by function level.
The way we are doing this in CUDAnative is using _dump_function from base/reflection.jl with the right CodegenParams, so it should be possible to create a @assert_noalloc similar to @code_llvm that sets up the CodegenParams correctly and will error if allocations are used. As an example see the irgen function in CUDAnative https://github.com/JuliaGPU/CUDAnative.jl/blob/e2ada6c66336f36e382f2247a72b1ad777dbd6e5/src/jit.jl#L145 . We then take the generated LLVM IR and use LLVM.jl to compile it to CUDA code and so to do anything more fancy one might use LLVM.jl to compile the dumped module and the use llvmcall to reinsert that module into the Julia compiler.