I am trying to implement some “fixed stochasticity” in individual functions of my model (called once), where I have a global seed and then this is re-seed on each needed function based on the name of the function and the global seed, such that different functions are uncorrelated but the same function provides at each call the same outcomes, without passing RNGs around.
Anyhow, my issue is that in order to achieve this behaviour I need to use a non-const global (at module level) on the macro, and I wonder if this has a performance effect on the function calling the macro, or the macro implies a barrier effect.
This is a snippet of what I am trying to achieve:
cd(@__DIR__)
module Foo
import Random
export random_seed
global_random_seed = 123
macro random_seed!()
return quote
st = stacktrace(backtrace())
myf = ""
for frm in st
funcname = frm.func
if frm.func != :backtrace && frm.func!= Symbol("macro expansion")
myf = frm.func
break
end
end
m1 = $("$(__module__)")
s = m1 * "$(myf)"
Random.seed!(hash("$s",UInt64(global_random_seed)))
@info "Random seeded with hash of \"$s\" and $(global_random_seed)"
end
end
function init()
Foo.global_random_seed = parse(Int64,readline("test_seed.txt")) #just 125 in test_seed.txt
end
module FooFoo
import ..Foo
import ..Foo:@random_seed!, global_random_seed
function foo()
@random_seed!()
println(rand())
println(rand())
end
function goo()
@random_seed!()
println(rand())
println(rand())
end
end
end
Foo.init()
Foo.FooFoo.foo() # Julia v1.11: 0.008282138701719233 0.9042476148934772
Foo.FooFoo.foo() # Julia v1.11: 0.008282138701719233 0.9042476148934772
Foo.FooFoo.goo() # Julia v1.11: 0.4430332457748505 0.7426054431450675
Foo.FooFoo.goo() # Julia v1.11: 0.4430332457748505 0.7426054431450675
Foo.FooFoo.foo() # Julia v1.11: 0.008282138701719233 0.9042476148934772
The macro has no arguments or computation besides instantiating the return expression, so it’s just pasting that with generated local variables into macro call sites. global_random_seed is just a symbol in the expression, so it’s going to work like any global variable in a macro-less method. The bad type inference isn’t saved by the UInt64 call alone because the language doesn’t mandate type constructors return their own type; it’s saved by the only hash(::String, ...) method returning UInt.
Could do global_random_seed::Int for inherent type stability (or why not ::UInt64 if that’s all you’ll use it for), though non-const global variables require assignment checks and is thus implemented by a reference to a reference the last time I checked. Indexing and mutating const global_random_seed = Ref(123) would halve the work, though there is the risk of reading garbage values “assigned” to an uninitialized Ref{Int}(). The performance gain was actually dubious then, not sure how CPUs handled things.
Thank you. I am not concerned by the performances of the macro, but of the function where the macro is called.
These are themselves high-level functions that are called only once, but may contain for loops and run for days. My understanding is that compiling time is then treasurable, right ?
I don’t know what you mean by “compiling time” if you’re concerned about runtime performance. Annotating the global variable with a type shouldn’t stress the compiler any more than typical type stability practices, and whether it significantly affects runtime depends on how much of the runtime handles global_random_seed::Any. If it’s just the one UInt64 call that restores type stability in a days-long run, it’s negligible. If it’s making many short calls in your hot loop type-unstable, then it’s worth doing global_random_seed:Int.
To the program as a whole. The gensym uses a global counter which is typically used in macros to generate unique symbols. Anywhere in packages which are imported anywhere. But the seeds will be fixed in any julia session when the module Foo has been imported.
It all depends on how stable you want your seeds to be. In julia 1.13, the hashes will change (Julia v1.13 Release Notes · The Julia Language), so then the seeds will change anyway. And some versions ago, the default rng changed from Mersenne twister to Xoshiro. And if you for some reason change your function name with the current setup, the seed will change.
It’s possible to replace the gensym with your own counter, but you may still get changes if you e.g. change the order of the function definitions in your module:
const count = Ref(0)
counter() = (count[] += 1)
macro ...
h = hash(counter())
quote
...
end
end
I think I would have made something like this, to make it reasonably fast and stable:
import SHA
import Random
const global_random_seed = Ref("")
function setseed(s)
global_random_seed[] = string(s)
nothing
end
macro random_seed!(id)
quote
hash = SHA.sha1(string($id, global_random_seed[]))
seed = reinterpret(UInt32, hash)
Random.seed!(seed)
end
end
function foo()
@random_seed!("foo")
println(rand())
println(rand())
end
function goo()
@random_seed!("goo")
println(rand())
println(rand())
end
The sha hash is standardized, so will be stable, but I’ve used an argument to the @random_seed macro, a literal string, to avoid the stack trace and save some time. SHA isn’t very fast compared to hash, so it’s a trade off. It’s possible to do the sha hashing somewhat faster by setting up a SHA.SHA1_CTX and do SHA.update! and SHA.digest! manually, instead of the prepacked SHA.sha1 which requires a single string.
You can also insert a @__MODULE__ in the string() inside the quote, to distinguish functions with the same id in different modules.
It’s also possible to get rid of the hashing altogether, since a slightly different seed will give a very different random sequence. Then you avoid allocations and hashing, apart from those inside Random.seed!:
const global_random_seed = Ref(UInt8[])
function setseed(s)
global_random_seed[] = codeunits(string(s))
nothing
end
macro random_seed!(id)
bytes = Vector(codeunits(id))
idlen = length(bytes)
quote
seedcodes = global_random_seed[]
totlen = $idlen + length(seedcodes)
pad = (4 - totlen % 4) % 4 # pad to reach multiple of 4 bytes
resize!($bytes, totlen+pad)
$bytes[$idlen+1:totlen] .= seedcodes
$bytes[end-pad+1:end] .= 0xaa # padding
seed = reinterpret(UInt32, $bytes)
Random.seed!(seed)
end
end
If you really need fast setting of the seed (a couple of nanoseconds instead of a few hundred), you can use the internal Random.getstate and Random.setstate!. They are not public, and may disappear or change in minor versions:
const global_random_seed = Ref(UInt8[])
const seedgeneration = Ref(0)
function setseed(s)
global_random_seed[] = codeunits(string(s))
seedgeneration[] += 1
nothing
end
mutable struct RandomState{T}
state::T
generation::Int
end
macro random_seed!(id)
state = RandomState(Random.getstate(Random.default_rng()), -1)
bytes = Vector(codeunits(id))
idlen = length(bytes)
quote
if seedgeneration[] ≠ $state.generation
seedcodes = global_random_seed[]
totlen = $idlen + length(seedcodes)
pad = (4 - totlen % 4) % 4 # pad to reach multiple of 4 bytes
resize!($bytes, totlen+pad)
$bytes[$idlen+1:totlen] .= seedcodes
$bytes[end-pad+1:end] .= 0xaa
seed = reinterpret(UInt32, $bytes)
Random.seed!(seed)
$state.generation = seedgeneration[]
$state.state = Random.getstate(Random.default_rng())
else
Random.setstate!(Random.default_rng(), $state.state)
end
end
end