'Python can run Mojo now' :: What would the equivalent Julia look like?

MDSW · June 23, 2025, 2:29pm

Since I often get asked about Mojo when advocating for Julia these days, I’m curious when I see an article like this: Python can run Mojo now | Vincent D. Warmerdam .

What would the equivalent call out from Python to Julia with JuliaCall look like, and how would it perform in terms of both time and numerical accuracy (see the noted issue in the post)?
Is it also possible to now use PyCall from Julia to effectively get to Mojo (assuming there were some advantage to doing so beyond “because we can”)?

Benny · June 23, 2025, 11:14pm

The Julia code could look very similar or be as simple as prod(1:Int(n)). The overflow problem (prod(1:100) == 0 in Julia) has nothing to do with the languages, it’s the Mojo implementation explicitly converting the arbitrary precision int input from Python to Mojo’s system word-sized Int, which like Julia’s Int alias is 32 bits on 32-bit systems and 64 bits on 64-bit systems. Simply multiplying a sequence of such integers would probably compile to the same native code, hence the same performance and integer overflow. I don’t know how Mojo would opt into arbitrary precision integer arithmetic, but in Julia it’s just working with BigInt.
Assuming you’re asking about PythonCall instead of PyCall, I’d assume it would just work because Python and Mojo are handling their integration. I haven’t gotten into Mojo yet so I wouldn’t really know if there are any issues though. The blogpost seems to provide a MWE so anyone with Mojo can give that and PythonCall a shot.

The timings in that blogpost aren’t reliable. At first glance, it makes no sense that mojo_module.factorial(10) would take longer than mojo_module.factorial(100), and the value recorded for math.factorial(100) is actually math.factorial(125).

>>> from math import factorial
>>> factorial(125)
188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000

Including the print and only calling the factorial function once in the benchmark is the likely reason for the nonsensical timings. We can account for that with the recommended benchmarking tools. In Julia v1.11.5 and with BenchmarkTools.jl:

julia> using BenchmarkTools

julia> @btime prod(1:$10);
  7.000 ns (0 allocations: 0 bytes)

julia> @btime prod(1:$100); # we know the output is wrong, but at least it's taking longer
  30.050 ns (0 allocations: 0 bytes)

julia> @btime prod(1:$(big(10))); # arbitrary precision
  4.457 μs (148 allocations: 3.18 KiB)

julia> @btime prod(1:$(big(100)));
  47.800 μs (1588 allocations: 34.19 KiB)

julia> @btime factorial($(big(100))) # much more efficient than prod!!
  318.696 ns (4 allocations: 104 bytes)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

There are likely better tools, but I really only use the timeit standard library in CPython. Output is in seconds, so you can divide by the million runs to microsecond (μs) estimates per call:

>>> from math import factorial
>>> import timeit
>>> timeit.timeit('factorial(100)', number=1000000, globals=globals())
0.9695415999740362
>>> factorial(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

The takeaways here:

Computing with fixed-size integer instances is much more efficient (10^3 x) than the same number of arbitrary precision instances, though the range of inputs for a factorial method is impractically small. In fact, Julia’s factorial implementation for Int just looks up a precomputed table of the mere 20 values that don’t overflow, and the table for UInt128 only extends that to 34 values, the 34th being unrepresentable by the signed Int128. Those tables are only 160 and 544 bytes of data.
Working with separately allocated and semantically immutable BigInt values contributes significant heap allocation overhead, and ditching any optimization for a simple prod doesn’t help either. factorial(::BigInt) simply forwards to GMP’s fac_ui routine (in C), and it uses an optimized algorithm and minimizes allocations for greater efficiency (~10^2 x).
CPython’s math.factorial (also in C) also uses an optimized algorithm (with some similarities to GMP’s) for its own arbitrary precision int, and it’s also more efficient than prod(1:big(100)), though it falls a bit short (~3x) of GMP’s. A Mojo reimplementation is unlikely to do better than either CPython or GMP.

Topic		Replies	Views
Performance comparison, Julia is a slower than Mojo, am I doing something wrong? Performance question , mojo	44	7996	September 28, 2023
Julia & Mojo Mandelbrot Benchmark General Usage mojo	57	26621	October 15, 2023
Advantages of Julia vs Mojo Community	27	9524	November 27, 2024
Why `Mojo` can be so fast? Offtopic mojo	14	11893	May 15, 2023
Julia call from Python3 running in single core General Usage	34	3916	December 2, 2016

'Python can run Mojo now' :: What would the equivalent Julia look like?

Related topics