I’m kind of disappointed in the reaction this has gotten from our community over the past few hours. I feel a little bit responsible about this since I mentioned the benchmarking tooling first thing and even though I followed this up with constructive feedback on what could be responsible for bad performance (in spite of the high startup time), seems like this thread took a sour turn. I’m very sorry about that. I honestly expected better here and I really want to say that this isn’t (or SHOULDN’T!) be representative of how our community interacts with someone who’s clearly out of their water when writing julia code. I hope we as a community can do better in the future and respect that we are no authority on how other people want to run their code. Sometimes this is due to external constraints, sometimes it’s simply because that’s how they prefer to do things. In either case, we shouldn’t harp on and on about how they’re doing it wrong when they’ve clearly stated why they’re doing it this way. If it has been mentioned and been addressed, there’s no need to pile on. Instead, if you don’t have anything constructive to add, please stay quiet.
I was asleep the past few hours, so I only now got to try running this benchmark for myself. I installed a recent java version (openjdk-16-jdk
), updated my rust install (from 1.34.0
to 1.54.0
) and taken a little bit deeper look at the julia code.
I ran into a little snag as soon as I tried to benchmark the original version from your repo (this is the julia-optimisations
branch), to establish a baseline:
:prechelt-phone-number-encoding $ cd src/rust/benchmark_runner/
:benchmark_runner $ cargo build --release
Compiling benchmark_runner v0.1.0 (/d/Documents/Projects/prechelt-phone-number-encoding/src/rust/benchmark_runner)
error[E0432]: unresolved import `libproc::libproc::pid_rusage`
--> src/main.rs:7:23
|
7 | use libproc::libproc::pid_rusage::{pidrusage, RUsageInfoV4};
| ^^^^^^^^^^ could not find `pid_rusage` in `libproc`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0432`.
error: could not compile `benchmark_runner`
To learn more, run the command again with --verbose.
So I guess I’m stuck with comparisons until I/we can fix this. I’m not a rusthead, mind you, so I’ll need a little help to resolve this.
Nonetheless, some more things I’ve noticed in the julia code that are probably not helpful in regards to performance:
-
join
can take an io
as its first argument. If done, it doesn’t have to allocate a new String
on invocation, just to immediately write it to the io
given to println
.
-
print(String(take!(::IOBuffer)))
is, in my eyes, an antipattern because it has the same problem as join
above - allocating a new String
, just to print it out. A better approach is write(stdout, take!(::IOBuffer))
, since that skips allocating the new string and just directly writes out whatever the IOBuffer
holds. Benchmarking shows this still allocates (less than creating the String though), but I think there’s a way around that. will update. Use write(stdout, seekstart(::IOBuffer))
instead.
I've timed one run with 100_000 phone numbers in julia with `@timev` and got the following result (rust took ~0.46s).
1.515357 seconds (4.79 M allocations: 175.866 MiB, 9.13% gc time, 0.55% compilation time)
elapsed time (ns): 1515357500
gc time (ns): 138356800
bytes allocated: 184409298
pool allocs: 2924124
non-pool GC allocs:158
malloc() calls: 785214
realloc() calls: 1077855
free() calls: 530567
GC pauses: 4
full collections: 1
This is running
julia> versioninfo()
Julia Version 1.7.0-beta3
Commit e76c9dad42 (2021-07-07 08:12 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:
JULIA_PKG_SERVER =
JULIA_NUM_THREADS = 4
Fixing the `join` thing I get this.
1.509860 seconds (4.61 M allocations: 167.470 MiB, 9.10% gc time, 0.38% compilation time)
elapsed time (ns): 1509860100
gc time (ns): 137324100
bytes allocated: 175605512
pool allocs: 2744532
non-pool GC allocs:156
malloc() calls: 785214
realloc() calls: 1077855
free() calls: 562247
GC pauses: 4
full collections: 1
At this point, my computer is runnign rather hot and starts to throttle, but I’ve done some more investigation and I think the fix by @DNF doesn’t work because it only resize!
s at the end of the function call, leading to wrong results (I checked with diff
against the rust version). Using pop!
after each recursive call works though.
timing from that last run, not representative though
vbogad@Taktikum:prechelt-phone-number-encoding $ diff <(sort out_jl_2.txt) <(sort out_rust.txt)
4516d4515
< 1.845223 seconds (8.10 M allocations: 280.115 MiB, 7.58% gc time, 0.21% compilation time)
29827,29835d29825
< bytes allocated: 293722264
< elapsed time (ns): 1845223400
< free() calls: 631601
< GC pauses: 6
< gc time (ns): 139949100
< malloc() calls: 785214
< non-pool GC allocs:156
< pool allocs: 6233025
< realloc() calls: 1077855
You can find the last code I ran in this gist. It includes other optimizations, like not recreating strings unnecessarily and a call to @timev
(which is where the output in tht diff
came from).
I’d recommend running on 1.7, since that alone seems to speed up your original code massively.