I believe I may have found a bug in Julia source code but I can’t pinpoint the exact line ([see this topic])(There is a bug in this function and I can't figure out what it is). What I want to do is modify the Distributed.jl module in Julia source code so that whenever I write using Distributed (or when a pacakge called using Distributed) my edited code is used.
This is what I’ve done so far. I’ve taken the actual source code files from Julia source and copied them into a new folder. Then I include this folder and call using .Distributed.
using Revise
includet("Distributed.jl")
using .Distributed
using ClusterManagers
a = addprocs(SlurmManager(2))
However, this dosnt work since ClusterManagers also does a using Distributed call and so it turns out that my edited Distributed != ClusterManagers.Distributed.
Is the only way to edit the Julia source code and recompile?
But just a few clarification steps. I would need to build the system image on my cluster where I don’t have root access and everything will need to be done in my home directory. Is this possible?
I also don’t want to mess with the existing Julia binary installation nor its packages (it’s a cluster with 18 nodes using inifiniband and so the the same binary is installed on all 18 nodes).
Is there a way to rebuild Base into a library and use the existing Julia binary installed systemwide (clusterwide) to call that shared library?
I tried that but there are functions in Distributed that use global variables defined in the Distributed module. Everytime I redefine the function, it complains that the global variable is not found. I suppose when I redefine the function by import Distributed.message_handler_loop it actually brings it into Main where the global variable is not defined.
I also realized that Distributed is not in Base but rather in stdlib. I wonder if the process of rebuilding stdlib is the same as rebuilding Base.
Ok, then use eval to directly evaluate code inside a module:
julia> @eval Distributed message_handler_loop(r_stream::IO, w_stream::IO, incoming::Bool) = println("This version does nothing")
message_handler_loop (generic function with 1 method)
julia> Distributed.message_handler_loop(stdout, stdout, true)
This version does nothing
Again, make sure to do this on all nodes with @everywhere.
Side note: I can’t use @everywhere since the bug is preventing the system to spawn and connect to the workers properly. It is addprocs() that bugs out.