I’m writing a package that creates a cryptographic key consisting of big prime numbers, and I need to write them to a file and read them back. Is there a preexisting library that does this, or do I have to write the code myself? Is a four-byte length big enough for BigInt
in general?
You could write the prime numbers, one per line, to an ordinary text file with println
, then read them back with readlines
and parse back to BigInt with parse(BigInt, line)
.
No. Not all BigInt
s fit in four bytes. They can be arbitrarily big.
A BigInt
can have more than 4294967295 bytes? I seriously doubt I’ll need a prime number that big, but if BigInt
s can be bigger than that, then the library should use an eight-byte length.
To elaborate, there might be something out there more specific to GMP (which BigInt
is built on), but writing out the integer in text can apply to any format. Here’s an example:
open("twotohundred.txt","w") do f
print(f, BigInt(2)^100)
end
That converts the BigInt
to a String
with the function string
in a deeper call, so it’ll look as you’d expect. If you use string
directly, you can also use write
instead of print
. You can delimit the integers with more than newlines, you can print
or write
anything.
GMP’s integer data increments at word size, that’s 4 bytes for 32-bit systems and 8 bytes for 64-bit systems. Converted to UTF-8 text in base 10 though, that’s 1 byte per digit.
Sure, if you have enough working memory for it.
No delimiter is necessary, because the number of bytes in the BigInt
will be written before the bytes themselves.
The number will be written as bytes, 8 bits per byte, not as decimal digits, 3.3 bits per byte.
I’ve never actually tried this, but the GMP library webpage https://gmplib.org states that the precision is only limited by available memory, so on a 64 bit system that could be more than 4 GB.
Definitely possible, GMP has functions for raw IO like this, and even if it didn’t, we have access to the raw data via BigInt
, just a matter of figuring out how much to read from the pointer. But I don’t know BigInt
or ccall
well enough to do this.
Is this really worth the extra effort? For a 4096-bit cryptographic key, this is less than a kilobyte of disk space saved…
It says it writes four bytes of size. However:
julia> big(256)^444;
julia> big(256)^2147483647;
julia> big(256)^2147483648;
julia> big(256)^4294967295;
julia> big(256)^4294967296;
julia> big(256)^5294967296;
julia> big(256)^5294967296%7
Killed
Julia can hold a number bigger than that, but on this computer can’t compute with it.
It’s more effort to convert it to decimal than to write the bytes as they are. Not that it matters much, as writing is I/O-bound. But most of the key files I’ve seen are either binary or base64 encoded, not decimal.
In that case, your computer is not going to find a prime number that needs more than 17 megabytes of storage either…
My god, 1 year, thousands of GPUs, and $2M just to find a 17MB-ish BigInt
we can instantiate in a few ms. I guess the eternal bragging rights is worth more than crypto or AI bot profits.
If you are not bothered by serialisation performance, you could use parse(BigInt, bytes2hex(x), base=16)
and the reverse hex2bytes(string(x; base=16))
. This, however, adds unnecessary computation and memory overhead, and you may benefit from int2octet
and octet2int
methods as implemented in CryptoGroups.jl as shown in the following line.
iirc GMP BigInt performance for parsing from a string is quite bad (in my experience with integers of a couple MB).
That said, I believe there is a GMP method for serializing and deserializing based on their in-memory representation. The problem is that it is not exposed on the Julia side.
I think I played around with calling the C function directly, I could go back and find it if there’s interest (I think it’d be worth submitting a PR to Julia, too)
For serialising BigInt
to the buffer there is:
_, written = GMP.MPZ.export!(@view(buffer[end-needed_bytes+1:end]), n;
order = 1, # Big-endian
endian = 0, # Native endian
nails = 0 # Use all bits
)
However there does not seem to be an inverse method. At least I haven’t managed to find one.
Yes, we need a wrapper of mpz_import
. Shouldn’t be hard to write. Could maybe improve the performance of the serialize
and deserialize
functions too?
Serialization currently uses str = string(n, base=62)
and parse(BigInt, str, base = 62)
to convert more efficiently to/from a buffer without going through decimal, so that’s also an option. (62 is the biggest base that you can represent with 1-byte [0-9A-Za-z]
digits.
Okay, it looks like I’ll write my own code to do this.
Unless there is a need for compact binary representation, text-editor compatible output, as stevengj suggested is best.
Even using base = 60
just as a nod to Babylonian times and to keep a couple of sentinal values for begin
, end
and other special digits (possible even prep for BigFloat serialization).