-
The Julia code could look very similar or be as simple as
prod(1:Int(n)). The overflow problem (prod(1:100) == 0in Julia) has nothing to do with the languages, it’s the Mojo implementation explicitly converting the arbitrary precisionintinput from Python to Mojo’s system word-sizedInt, which like Julia’sIntalias is 32 bits on 32-bit systems and 64 bits on 64-bit systems. Simply multiplying a sequence of such integers would probably compile to the same native code, hence the same performance and integer overflow. I don’t know how Mojo would opt into arbitrary precision integer arithmetic, but in Julia it’s just working withBigInt. -
Assuming you’re asking about
PythonCallinstead ofPyCall, I’d assume it would just work because Python and Mojo are handling their integration. I haven’t gotten into Mojo yet so I wouldn’t really know if there are any issues though. The blogpost seems to provide a MWE so anyone with Mojo can give that andPythonCalla shot.
The timings in that blogpost aren’t reliable. At first glance, it makes no sense that mojo_module.factorial(10) would take longer than mojo_module.factorial(100), and the value recorded for math.factorial(100) is actually math.factorial(125).
>>> from math import factorial
>>> factorial(125)
188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000
Including the print and only calling the factorial function once in the benchmark is the likely reason for the nonsensical timings. We can account for that with the recommended benchmarking tools. In Julia v1.11.5 and with BenchmarkTools.jl:
julia> using BenchmarkTools
julia> @btime prod(1:$10);
7.000 ns (0 allocations: 0 bytes)
julia> @btime prod(1:$100); # we know the output is wrong, but at least it's taking longer
30.050 ns (0 allocations: 0 bytes)
julia> @btime prod(1:$(big(10))); # arbitrary precision
4.457 μs (148 allocations: 3.18 KiB)
julia> @btime prod(1:$(big(100)));
47.800 μs (1588 allocations: 34.19 KiB)
julia> @btime factorial($(big(100))) # much more efficient than prod!!
318.696 ns (4 allocations: 104 bytes)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
There are likely better tools, but I really only use the timeit standard library in CPython. Output is in seconds, so you can divide by the million runs to microsecond (μs) estimates per call:
>>> from math import factorial
>>> import timeit
>>> timeit.timeit('factorial(100)', number=1000000, globals=globals())
0.9695415999740362
>>> factorial(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
The takeaways here:
- Computing with fixed-size integer instances is much more efficient (10^3 x) than the same number of arbitrary precision instances, though the range of inputs for a factorial method is impractically small. In fact, Julia’s
factorialimplementation forIntjust looks up a precomputed table of the mere 20 values that don’t overflow, and the table forUInt128only extends that to 34 values, the 34th being unrepresentable by the signedInt128. Those tables are only 160 and 544 bytes of data. - Working with separately allocated and semantically immutable
BigIntvalues contributes significant heap allocation overhead, and ditching any optimization for a simpleproddoesn’t help either.factorial(::BigInt)simply forwards to GMP’sfac_uiroutine (in C), and it uses an optimized algorithm and minimizes allocations for greater efficiency (~10^2 x). - CPython’s
math.factorial(also in C) also uses an optimized algorithm (with some similarities to GMP’s) for its own arbitrary precisionint, and it’s also more efficient thanprod(1:big(100)), though it falls a bit short (~3x) of GMP’s. A Mojo reimplementation is unlikely to do better than either CPython or GMP.