I understand that Julia produces PRNs. (Pseudo Random Numbers). The website (www.random.org) advertises itself as producing true random numbers. I can understand that for testing purposes PRNs may be desired, but once testing is completed, wouldn’t it be preferred to have true random numbers? Might there be some way in which Julia could be tied into this website?
Yes, pseudo by default (see below for non-default true random with Julia). Truly random is in vast majority of cases not needed (I recall one case, a paper claiming truly random from a quantum process better for machine learning… I wasn’t believing it).
If you want to do a quantum suicide then Julia isn’t up to the job with its default rand:
[I do NOT recommend any kind of suicide, and argue the quantum version doesn’t work. I.e. it ruling out the multiverse working; as a good theory. It’s actually non-scientific, i.e. Popperian non-falsifiable, thus religion-like, so please don’t believe in such and (that) immortality, Which means all other interpretations of QM are too… assuming they are just interpretations. Some are more, thus “interpretation” the misleading/technically incorrect word in some cases below, since scientifically leading to different conclusions.]
The existence of the wave function collapse is required in:
[but not this one] the Copenhagen interpretation
[most likely one of there] the objective collapse interpretations
[maybe] the transactional interpretation
[…]
[not believing in these] On the other hand, the collapse is considered a redundant or optional approximation in:
the consistent histories approach, self-dubbed “Copenhagen done right”
[…]
the many-worlds interpretation
[…]
[maybe…] the relational quantum mechanics interpretation
You can get true random, actually based on a quantum process in Julia, most (non-ancient) Intel CPUs have an instruction for that, and AMD I guess; and ARM?
Julia’s rand is NOT ok for cryptography, but Julia’s stdlib allows for that, if I recall, it’s just non-default, and it accesses (on Linux), if I recall:
No. Generating true randomness is rarely necessary, but requires dedicated hardware. In any case, it’s not something you’d implement in a programming language. Compare:
julia> using Random
help?> Random.RandomDevice
RandomDevice()
Create a RandomDevice RNG object. Two such objects will always generate different streams of random numbers. The entropy is obtained from the operating system.
julia> rand(RandomDevice())
0.3682754503315735
“Two such objects will always generate different streams” is a really strong guarantee.
As for PRNGs, they can be difficult to tell apart from “truly random”, e.g. see GitHub - JuliaRandom/RNGTest.jl: Code for testing of Julia's random numbers
If you’re doing cryptography, you’re going to have higher standards.
If you’re doing Monte Carlo, you’ll want PRNGs; so long as you don’t have overlapping streams between threads, being pseudo-random won’t impact accuracy, while their better speed will improve accuracy/second (i.e., can sample more in the same amount of time, or sample for less time).
As for why Julia (or all programming languages I know) don’t give “true” random numbers by default:
PRNs are awesome for reproducibility: you just need to record the seed, then you can reproduce the stream of random numbers.
“True” random numbers are expensive. The numbers produced by /dev/random consume entropy that’s gathered from the system (e.g. from the precise timing of some interrupts). You can get only so many random bytes per second, so reading from /dev/random can block until more entropy is gathered. That’s why even Random.RandomDevice doesn’t use /dev/random but a cryptographically secure PRNG seeded (periodically I think) by /dev/random. Hooking into random.org would also be expensive, require an Internet connection, and imply high latency and limited throughput.
In general, for scientific purposes true randomness is not needed. It’s sufficient that pseudorandom generators produce numbers which are independent of any process one models in the program. Most newer pseudorandom generators are ok, with some exceptions when it comes to parallel processing.
The big consumer of true random numbers is cryptography, which requires that the random numbers are neither predictable nor reproducible nor controllable. This rules out randomness from system state, and from the outside, like a web site or cosmic radiation.
To produce true random numbers you can not use a digital computer, you must tap into some analog circuitry. Modern CPUs have instructions RDRAND and RDSEED for this purpose. They are fed from on-chip analog circuitry, for AMD it’s ring oscillators (which utilizes thermal noise and/or shot noise), for Intel I don’t know for certain, probably something similar. Professional equipment on FPGAs use their own stuff and do not trust CPUs with non-public circuitry. It can be various kinds of ring oscillators, TERO, or separate circuitry based on noise diodes like Noisecom > Products > Components > NC100/200/300/400 Series Chips and Diodes. Historically, radioactive decay has been used, e.g. with a Co-60 source or similar.
It’s not entirely straightforward to ensure that the true random numbers are always random when attacked by an adversary with unlimited budget, as this paper shows: https://ujm.hal.science/ujm-00699618/document
We believe that, for practical purposes, P != NP and that CSPRNGs exist.
On modern systems, /dev/random and /dev/urandom both access the same random numbers and don’t consume entropy. You don’t consume entropy because entropy is only used to seed a CSPRNG.
The difference is that /dev/random will block until the kernel is happy with the seeding.
That is incorrect. Random.RandomDevice() is a singleton. Each access incurs a syscall:
$ strace -o trace julia -e "import Random; for i=1:100_000 rand(Random.RandomDevice()) end;"
$ wc -l trace
101468 trace
This is very important. The reason is that the system running your julia program might be on a virtual machine. The virtual machine might get suspended. And then it might get woken up, from the same state, twice.
If you seeded a userland CSPRNG or if there was any buffering, and you then used that “supposedly secure” random for e.g. a DSA signature, then you just published your private keys to the internet (nonce reuse) – the most insidious failure mode of any crypto system.
On a proper setup, the OS kernel gets informed about that and asks the hypervisor for some entropy. Your userland code is not privileged in that way.
PS. That failure mode is also a reason to fold in a crypto-hash of your message into the nonce for DSA – this fundamentally protects you from the “publish my private key” failure mode of nonce reuse.
A recentish high-profile example is the Playstation 3. Sony published the private signing keys to their firmware via creative nonce-reuse.
Can’t you just read the clock occasionally and mix into PRNG state? If the clocks in the virtual machine are messed up, then indeed you are in a malicious case, and then some other sources of randomness are needed.
I guess that’s a very recent development? In 2020 at least the Linux /dev/random was still consuming from a blocking pool (with a parent pool used both for the blocking bool and for seeding /dev/urandom). See page 19 of this paper.
Also my random manpage from section 4 still says the following:
[/dev/random] will return random bytes only within the estimated number of bits of fresh noise in the entropy pool, blocking if necessary. […] When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.
Apparently that was changed in 2020 and the man page is out of date.
It’s not incorrect, I was talking about the system CSPRNG that RandomDevice uses through libuv which calls getrandom. My point was that getrandom uses /dev/urandom rather than /dev/random so it doesn’t deplete the blocking entropy bool. (But as you said, this point is now irrelevant now that both sources do the same thing after initialization.)
Anyway thanks for the interesting point regarding virtual machines, I wasn’t aware of this attack.
Thanks all for the links regarding the linux kernel dev!
I kinda mentally suppressed how recent that change is, due to how embarrassingly braindead the old behavior was.
(entropy does not drain from a CSPRNG. 256 bit of entropy are enough to seed the PRNG needs of all of human history; maybe 512 bits if you’re concerned about a fantastical far future involving interstellar travel. The real problems are bootstrapping and snapshot-resume in VMs and good APIs for applications to communicate either “give me the best random you can” vs “this is super important, I’m gonna keygen or DSA on this – give me good random if you can, otherwise stalling or crashing is preferable to a compromised high-value long-term private key”)
I also find it satisfying not to use any randomness in nonce generation at all and generate \rm{seed} = \rm{key}|\rm{message}|\rm{counter} which is fed into CSPRG (The counter allows to regenerate signature when r, s are not within needed range. Generally only needed for toy examples).
If someone is interested, I have implemented CSPRG according to Verificatum verifiable shuffle specification in CryptoGroups.jl in Specs/primitives.jl. CSPRG is also specified in FIPS standart if someone is curios implementing.
I glanced at the list in random.org for the examples. In some examples, true randomness is overkill, like generating a random colour or Jazz scales. For passwords, a private key generation is appropriate. Still, it can be strengthened with a two-party protocol in situations where a random private key needs to be generated on a smart card. For games and lotteries, true randomness is insufficient. There, you need evidence that the author indeed has generated a true random number and that dice had not been rolled multiple times until the desired outcome was obtained. This is where one would reach for solutions like the League of Entropy, which provides verifiability to the randomness.