Reading and writing BigInt to file

I’m writing a package that creates a cryptographic key consisting of big prime numbers, and I need to write them to a file and read them back. Is there a preexisting library that does this, or do I have to write the code myself? Is a four-byte length big enough for BigInt in general?

You could write the prime numbers, one per line, to an ordinary text file with println, then read them back with readlines and parse back to BigInt with parse(BigInt, line).

No. Not all BigInts fit in four bytes. They can be arbitrarily big.

1 Like

A BigInt can have more than 4294967295 bytes? I seriously doubt I’ll need a prime number that big, but if BigInts can be bigger than that, then the library should use an eight-byte length.

2 Likes

To elaborate, there might be something out there more specific to GMP (which BigInt is built on), but writing out the integer in text can apply to any format. Here’s an example:

open("twotohundred.txt","w") do f
    print(f, BigInt(2)^100)
end

That converts the BigInt to a String with the function string in a deeper call, so it’ll look as you’d expect. If you use string directly, you can also use write instead of print. You can delimit the integers with more than newlines, you can print or write anything.

GMP’s integer data increments at word size, that’s 4 bytes for 32-bit systems and 8 bytes for 64-bit systems. Converted to UTF-8 text in base 10 though, that’s 1 byte per digit.

Sure, if you have enough working memory for it.

1 Like

No delimiter is necessary, because the number of bytes in the BigInt will be written before the bytes themselves.

The number will be written as bytes, 8 bits per byte, not as decimal digits, 3.3 bits per byte.

1 Like

I’ve never actually tried this, but the GMP library webpage https://gmplib.org states that the precision is only limited by available memory, so on a 64 bit system that could be more than 4 GB.

Definitely possible, GMP has functions for raw IO like this, and even if it didn’t, we have access to the raw data via BigInt, just a matter of figuring out how much to read from the pointer. But I don’t know BigInt or ccall well enough to do this.

Is this really worth the extra effort? For a 4096-bit cryptographic key, this is less than a kilobyte of disk space saved…

2 Likes

It says it writes four bytes of size. However:

julia> big(256)^444;

julia> big(256)^2147483647;

julia> big(256)^2147483648;

julia> big(256)^4294967295;

julia> big(256)^4294967296;

julia> big(256)^5294967296;

julia> big(256)^5294967296%7
Killed

Julia can hold a number bigger than that, but on this computer can’t compute with it.

It’s more effort to convert it to decimal than to write the bytes as they are. Not that it matters much, as writing is I/O-bound. But most of the key files I’ve seen are either binary or base64 encoded, not decimal.

In that case, your computer is not going to find a prime number that needs more than 17 megabytes of storage either… :wink:

2 Likes

My god, 1 year, thousands of GPUs, and $2M just to find a 17MB-ish BigInt we can instantiate in a few ms. I guess the eternal bragging rights is worth more than crypto or AI bot profits.

If you are not bothered by serialisation performance, you could use parse(BigInt, bytes2hex(x), base=16) and the reverse hex2bytes(string(x; base=16)). This, however, adds unnecessary computation and memory overhead, and you may benefit from int2octet and octet2int methods as implemented in CryptoGroups.jl as shown in the following line.

iirc GMP BigInt performance for parsing from a string is quite bad (in my experience with integers of a couple MB).

That said, I believe there is a GMP method for serializing and deserializing based on their in-memory representation. The problem is that it is not exposed on the Julia side.

I think I played around with calling the C function directly, I could go back and find it if there’s interest (I think it’d be worth submitting a PR to Julia, too)

For serialising BigInt to the buffer there is:

_, written = GMP.MPZ.export!(@view(buffer[end-needed_bytes+1:end]), n;
                                 order = 1,    # Big-endian
                                 endian = 0,   # Native endian
                                 nails = 0     # Use all bits
                                 )

However there does not seem to be an inverse method. At least I haven’t managed to find one.

Yes, we need a wrapper of mpz_import. Shouldn’t be hard to write. Could maybe improve the performance of the serialize and deserialize functions too?

Serialization currently uses str = string(n, base=62) and parse(BigInt, str, base = 62) to convert more efficiently to/from a buffer without going through decimal, so that’s also an option. (62 is the biggest base that you can represent with 1-byte [0-9A-Za-z] digits.

4 Likes

Okay, it looks like I’ll write my own code to do this.

Unless there is a need for compact binary representation, text-editor compatible output, as stevengj suggested is best.
Even using base = 60 just as a nod to Babylonian times and to keep a couple of sentinal values for begin, end and other special digits (possible even prep for BigFloat serialization).