Help to get my slow Julia code to run as fast as Rust/Java/Lisp

If you plan to run the real computation “around a minute” then it should be possible to get decent results. Think about the time to run the Rust or C compiler to make your binary executable. I think Julia should be similar in compilation time to those.

1 Like

And if you put your code into a package, precompilation will occur when the package is installed, and what is actually compiled at usage time is less than that.

4 Likes

I have just been able to confirm your result. The output of your printTranslations is equal to the output of the original one, and your code is almost 10 times faster than the original.

My test code can be found in

With these, anyone can easily try out the optimization. Let’s have fun!

The output of julia translate_phone_numbers.jl is the following:

numbers = randphone(1000000):
  0.469635 seconds (2.25 M allocations: 250.724 MiB, 11.10% gc time, 14.07% compilation time)

open(loadDictionary, "dictionary.txt"):
  0.890610 seconds (13.38 M allocations: 503.104 MiB, 27.57% gc time)

translate by printTranslations_original:
101.994933 seconds (1.35 G allocations: 21.675 GiB, 21.68% gc time, 0.19% compilation time)

translate by printTranslations of jonathanBieler:
 11.927243 seconds (49.74 M allocations: 1.678 GiB, 4.95% gc time, 0.51% compilation time)

result_original == result_jonathanBieler = true
3 Likes

Unlike @dlakelan, I think that’s a perfectly fine usecase for julia and is definitely part of the future development plan:

So I’d say I disagree that people looking at artifacts are not part of the target audience :slight_smile: In the past 2-3 years, there were just much lower hanging fruit that got more attention because the first focus is the REPL. That’s not to say that there is noone compiling julia code to apps (PackageCompiler.jl is one such project), it “just” lacks the language support from a tooling perspective. There’s also an open position for a compiler engineer at juliacomputing, so my guess is that this part of julia development will pick up steam in the future.

I’m personally looking forward to the exciting future of standalone julia apps, which I think will really hammer home that julia is here to stay.

10 Likes

These are all good points, but I still doubt that Julia in the next decade will ever be widely used to do things like create a script that will be called from a loop in bash and each time it’s called it say outputs 1024 bytes read from /dev/urandom in hex format. Why? Because such a program should be about 10kB in size and run in less than 1ms, and it seems likely that Julia will always be hundreds of megabytes in size and startup time is likely to be at least hundreds of ms just to get all the stuff read into RAM and initialized. I could be wrong, but I do think these sort of “tiny scripts” will be a ways off at least. Maybe someday Julia will output just the subset of Julia actually used by the script and can make that 10kB artifact. If so, good! It would be a mistake to think at the moment that Julia should be able to do that though.

2 Likes

You sure seem pessimistic about this future if not even core developers outlining their new roadmap and targets for the julia compiler during this years’ juliacon can convince you of this :man_shrugging:

Calling “scripts” in a loop is also not the target of this discourse thread - whether it’s compiled ahead of time doesn’t seem to matter for OP, so I’d say it’s fair enough.


@renatoathaydes I’d be interested in what sort of timings you get with the revised version by @jonathanBieler. If the numbers are true, it should put julia squarely into the realm of java (a little better, even). Using PackageCompiler.jl would then get rid of the startup latency (though it won’t erase the memory footprint completely, since all of julia is still loaded as far as I know. That’s an ongoing topic and part of the compiler roadmap I linked to in my last reply though).

On the contrary, I’m optimistic that Julia will continue to target its core strengths, and add some additional ones :wink: I just doubt that producing tiny 10kB binaries will be the focus. I actually hope it WON’T because it will distract from other great stuff. I have to admit that I hadn’t had to chance to watch the linked video yet. If they can strip out the LLVM stuff and have a 1MB sysimage that can’t codegen, that’d go a long way towards making it reasonable to have more scripty stuff. But hey, they started the bit you linked to by saying that most of the stuff they had on last years roadmap was still there :wink:

well that was the point you made yesterday:

I guess what I’m saying is if someone wants to use julia to make a thing that they can call from a bash script to do some work that should take a few milliseconds to a second, that this is not what Julia does today, and probably it won’t be doing it next year, or the year after. Maybe in a couple years if they make good progress on the topics you linked in the video.

as the OP said

So this kind of benchmark will always show Julia is slow. But, if you ask it to do “a couple minutes” of work, it won’t seem at all slow.

1 Like

@Sukera I created a ticket on my repo that links to a commit that makes the benchmark_runner runnable in any OS (not just Mac). Feel free to run that if you’re curious.

Before I go back to this problem, I am going to learn Julia a bit more and decide what’s the best way of running the program. It seems that PackageCompiler is the best option so far.

@dlakelan I am not sure why you’re dismissing what I am trying to do. Problems like the one in the program I am benchmarking are fairly common in the scientific world: you have a very large combination of possibilities and you want to find all solutions efficiently. Do you think this is not the kind of problem Julia would be good at? If not, then what exactly you think Julia would be good at?? Plotting pretty charts?

The new benchmark I used in the last parts of my Rust blog post runs for several minutes. It finds billiions of solutions, but I am trying to reduce the space of valid solution (without decreasing the sample space) so that printing the results does not become the bottleneck.

I am also trying to think of better real world problems I could use for benchmarking… perhaps some of the optimisation findings people are making on these toy benchmarks could be useful to speed up other’s real-world programs, who knows. Scientists may not be very good at optimisations but performance buffs on the Internet can make miracles when they feel their favourite language is being treated unfairly :smiley: !

Just my two cents, but I bet it isn’t. From your description of the problem it seems that using Julia normally (that is, loading the package and running your problem) will be good enough. FYI, I myself never had to use PackageCompiler for anything, it is not something that is at the forefront of user experiences.

3 Likes

I’m not. because you say

and

Provided that the problem is “big enough” to make it worthwhile to use Julia, then it makes sense. It sounds like your problem is big enough, but maybe wasn’t in an earlier test version of the problem? I haven’t followed in detail all of what has gone on here. I just think it’s important to understand that

julia myscript.jl

has on the order of seconds of overhead built-in for nontrivial scripts.

even just loading and exiting takes 1/4 second.

time julia -e "exit()";

real   0m0.234s
user   0m0.146s
sys	   0m0.292s

so you should only be using Julia to do this stuff if myscript.jl takes … at least seconds to complete.

1 Like

To do this I would start julia once and then loop over all possibilities (maybe with parallel computing), but I would not start one julia process for each possibility.

3 Likes

Thanks for your comment, but I would appreciate if you at least had a look at what the program is doing if you are going to criticize, as if you did that you would know that’s not what’s happening.

Sorry, I didn’t mean to criticise. I guess I misunderstood. It looks like your loop in the bash script starts julia once for each input file. This reminded me of when I started using julia and I had scientific problems where I wanted to simulate something for many different parameter choices: I used bash scripts to loop over these parameter choices (like I did with my old Fortran or C code). But after a while I stopped using bash scripts for these “outer loops” and put them directly into julia, to avoid the overhead mentioned by others above. In your context it would be like passing all files you want to evaluate at once to the julia script. But I know there are settings where this is not possible…

2 Likes

This is far from a requirement for small short-running scripts, as evidenced by a very wide use of python in such scenarios.
I don’t think even perl can start in 1 ms.

1 Like

For the curious

It was just an example. For very very simple things, use something that produces a very simple artifact. For medium complexity stuff, use Julia but expect some effect of overhead, for high complexity long running things Julia will be outstanding… It’s a spectrum.

btw the example would be equivalent to a subset of the features of xxd, which is an 18kB binary on my machine

xxd -p -l 1024 /dev/urandom
ls -l /usr/bin/xxd
-rwxr-xr-x 1 root root 18552 Mar  1 18:58 /usr/bin/xxd

As requested, I ran the script in question using the excellent hyperfine (which I would also recommend in place of custom CLI benchmarking tools because it is far more comprehensive!)

Edit: I also added in @jonathanBieler’s version as phone_encoder_opt.jl, minus the warmup timing.

> hyperfine 'julia1.7 src/julia/phone_encoder.jl' 'julia1.7 --compile=min -O0 src/julia/phone_encoder.jl' 'julia1.7 src/julia/phone_encoder_opt.jl' 'julia1.7 --compile=min -O0 src/julia/phone_encoder_opt.jl' 'julia1.7 -e "println()"' 'julia1.7 --compile=min -O0 -e "println()"' 'src/rust/phone_encoder/target/release/phone_encoder' --warmup 1
Benchmark #1: julia1.7 src/julia/phone_encoder.jl
  Time (mean ± σ):     598.5 ms ±   4.9 ms    [User: 750.6 ms, System: 294.8 ms]
  Range (min … max):   590.0 ms … 607.1 ms    10 runs
 
Benchmark #2: julia1.7 --compile=min -O0 src/julia/phone_encoder.jl
  Time (mean ± σ):     198.4 ms ±   2.9 ms    [User: 363.1 ms, System: 295.0 ms]
  Range (min … max):   193.4 ms … 203.5 ms    15 runs
 
Benchmark #3: julia1.7 src/julia/phone_encoder_opt.jl
  Time (mean ± σ):     523.3 ms ±   6.9 ms    [User: 681.0 ms, System: 282.8 ms]
  Range (min … max):   515.6 ms … 538.4 ms    10 runs
 
Benchmark #4: julia1.7 --compile=min -O0 src/julia/phone_encoder_opt.jl
  Time (mean ± σ):     196.5 ms ±   2.3 ms    [User: 366.1 ms, System: 274.7 ms]
  Range (min … max):   192.6 ms … 201.6 ms    15 runs
 
Benchmark #5: julia1.7 -e "println()"
  Time (mean ± σ):     118.7 ms ±   1.6 ms    [User: 124.1 ms, System: 85.1 ms]
  Range (min … max):   115.8 ms … 121.7 ms    24 runs
 
Benchmark #6: julia1.7 --compile=min -O0 -e "println()"
  Time (mean ± σ):     108.1 ms ±   2.2 ms    [User: 88.4 ms, System: 65.8 ms]
  Range (min … max):   105.2 ms … 115.0 ms    28 runs
 
Benchmark #7: src/rust/phone_encoder/target/release/phone_encoder
  Time (mean ± σ):       0.6 ms ±   0.2 ms    [User: 0.5 ms, System: 0.1 ms]
  Range (min … max):     0.4 ms …   6.4 ms    3210 runs
 
  Warning: Command took less than 5 ms to complete. Results might be inaccurate.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'src/rust/phone_encoder/target/release/phone_encoder' ran
  185.07 ± 78.03 times faster than 'julia1.7 --compile=min -O0 -e "println()"'
  203.26 ± 85.64 times faster than 'julia1.7 -e "println()"'
  336.49 ± 141.75 times faster than 'julia1.7 --compile=min -O0 src/julia/phone_encoder_opt.jl'
  339.73 ± 143.15 times faster than 'julia1.7 --compile=min -O0 src/julia/phone_encoder.jl'
  896.14 ± 377.55 times faster than 'julia1.7 src/julia/phone_encoder_opt.jl'
 1024.84 ± 431.64 times faster than 'julia1.7 src/julia/phone_encoder.jl'

For those wondering how the debug rust build compares, there is essentially no difference between it and the release build.

A couple conclusions we can draw from this:

  1. This is a really trivial benchmark (to compute) and startup time makes a big difference. That the Rust implementation can execute one iteration in a millisecond instead of multiple seconds means that any kind of runtime boot latency will be highly penalized.
  2. Whatever is slow, it’s probably not chartToDigit. Yes, optimizing that cut a decent amount off execution time, but the biggest difference comes from not compiling aggressively at all and running the interpreter. I’m not entirely sure about the real bottleneck, but my hunch is something to do with IO.
1 Like

That perspective makes sense but can you change shells? If so, I have this neat shell called julia and the commands I run in it are first class functions. They have normal shell speed the first time you run them (a little delay), but then they’re super fast because it compiles them! It’s a pretty great shell language, you should check it out.

16 Likes

Follow up: for those who want to dig in, here are flamegraphs for both the optimized julia implementation and the rust one: prof_julia.svg · GitHub

Let’s be nice here. It’s his benchmark, he gets to decide what’s measured. People can disagree with the methodology but let’s keep it kind and helpful. Also he did not need to come to our forum and solicit advice on speeding up the Julia version. Let’s not make him regret it.

I do think that given that it’s a command line tool it makes sense to measure CLI startup time. However, it would then also make sense to use PackageCompiler and add the appropriate precompiling statements for the Julia program to make a fast-starting command line tool since that’s the goal here and that’s what one would do for a command line tool in Julia.

11 Likes