I need to write a single-argument function f(r::Number)
that returns a different random number for different values of r
, but always the same number for the same r
. Same r
is meant in the sense of isequal
or ==
, it doesn’t matter much, so if r
is a Float64
, say, f(r)
will be “discontinuous at every point”. I can imagine hacky ways that convert r
to an integer based on its bit representation, and then use that as seed, but that’s far from efficient (because building/updating a generator for each call is expensive). I cannot seem to find a more performant/elegant way. Any ideas?
You could go with a caching approach perhaps:
julia> let cache = Dict()
global f(r::Number) = get!(randn, cache, r)
end
f (generic function with 1 method)
julia> f(1)
-0.4507461071970271
julia> f(1)
-0.4507461071970271
julia> f(2.5)
0.7257163824509684
julia> f(2.5)
0.7257163824509684
What’s the application? One alternative would be a smoothed random function like a smooth random walk, which could be implemented in ApproxFun.jl similar to how it works in chebfun.
@mike , thanks, that’s a great idea. It actually works better than I expected, but I worry that the cache grows too much after using the function for a while. What I actually had in mind is more computational, mapping information about r
, like its hash, to a computed random number, but I might eventually go with your approach.
@stevenj, this is for implementing a reproducible disorder realization on a lattice. The user designs a lattice and asks for a spatially uncorrelated potential to apply to it. If I understand correctly, you propose to somehow define some random spatial harmonics or coefficient and build the function by expanding it to a finite order in some function basis using those coefficients? In this sense I’d say @mike’s approach is better for my purposes, because you don’t need to worry about spatial correlations (due to too few harmonics), and coefficients are analogous to his cache, but of fixed size. Or maybe I misunderstood?
Yeah, agreed, it’s unlikely to be sustainable for a long running f
.
Could combine the hash
ing idea with an rng
that you just reseed based on the hash of the number:
julia> using Random
julia> let rng = MersenneTwister()
global f(r::Number) = (Random.seed!(rng, hash(r)); randn(rng))
end
f (generic function with 1 method)
julia> f(1)
-0.7338017765779451
julia> f(1)
-0.7338017765779451
julia> f(1.1)
-1.1040149708608498
julia> f(1.1)
-1.1040149708608498
That’s just what I tried first. However it is pretty slow
julia> @btime f(2.3)
10.648 μs (4 allocations: 152 bytes)
-0.5152781040212041
You may have more luck with some of the rngs from packages. RandomNumbers
has the following:
julia> let rng = RandomNumbers.Xorshifts.Xoroshiro128()
global g(r::Number) = (Random.seed!(rng, hash(r)); randn(rng))
end
g (generic function with 1 method)
julia> g(1)
1.1158401688820527
julia> g(1)
1.1158401688820527
julia> @benchmark f(1)
BenchmarkTools.Trial:
memory estimate: 152 bytes
allocs estimate: 4
--------------
minimum time: 9.860 μs (0.00% GC)
median time: 9.950 μs (0.00% GC)
mean time: 10.096 μs (0.00% GC)
maximum time: 26.180 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark g(1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.708 ns (0.00% GC)
median time: 7.768 ns (0.00% GC)
mean time: 7.889 ns (0.00% GC)
maximum time: 24.985 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
Though I’m not too familiar with the particulars of that generator…
I just came up with this. Is hash
statistically equivalent to rand
?
julia> f(r::Number) = hash(r)/typemax(UInt)
julia> @btime f(2.3)
0.033 ns (0 allocations: 0 bytes)
0.664561538429725
Oh! Your Xoroshiro128 is very nice! And you can use whatever distribution you need.
That timing means that your computation got constant folded and no work was actually done at runtime. You can do
julia> @btime f($(Ref(2.3))[])
3.355 ns (0 allocations: 0 bytes)
0.664561538429725
Also hash
es can have collisions.
Hi @kristoffer.carlsson, yes, I’m aware about the timing thing. About the collisions, is hash() any worse than rand() in regards to getting the same solution twice? And do you know if hashes are uniformly distributed in the whole UInt
range? In any case, I really like @mike’s last solution, I think I’ll mark that as the answer.
It seems that, ideally, yes.
But “Note that this criterion only requires the value to be uniformly distributed , not random in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true.”
Sounds like a hash function is what you need. There are lots around, so maybe the built-in is not perfect but others should be good. I think the cryptographic hash function address your concern above.
"a small change to a message should change the hash value so extensively that a new hash value appears uncorrelated with the old hash value "
hash
provides no statistical guarantees.
Yes, hashing a seed seems to be the way to go, especially as you can prescribe the marginal distribution (Uniform, Normal, other) this way and get the statistical properties. You can even split the hash into some bits for the seed and some bits to discard the first n
samples of rand
if you distrust the statistical properties of the hash.