Hi,
I am comparing a simple recursive function for the tower of hanoi in julia with java.
This is the java code:
class hanoi
{
static void towerOfHanoi(int n, char source, char target, char helper)
{
if (n == 1)
{
System.out.println("Move disk 1 from source " + source + " to rod " + target);
return;
}
towerOfHanoi(n-1, source, helper, target);
System.out.println("Move disk " + n + " from rod " + source + " to rod " + target);
towerOfHanoi(n-1, helper, target, source);
}
public static void main(String args[])
{
int n = 15; // number discs
towerOfHanoi(n, 'A', 'C', 'B');
}
}
And this is the one in julia
function towerOfHanoi(n::Int, source::String, target::String, helper::String)
if (n == 1)
println("Move disk 1 from source $source to target $target ")
return
end
towerOfHanoi(n - 1, source, helper, target)
println("Move disk $n from source $source to target $target ")
towerOfHanoi(n - 1, helper, target, source)
end
n::Int = 15 # number discs
towerOfHanoi(n,"A"::String, "C"::String, "B"::String)
The julia code is much slower (depending on the number of discs and depending on the print-commands). Julia 0.64s and java 0.38s with simple time call on a macos system. It is julia 1.9.0.
Do you have some hint, how to improve the julia code?
I saw some other discussion that small recursive functions might not work to well in julia, but maybe my current setup is really not good. And I did not test it with 1.10 yet, which should be faster for this than 1.9.0, but was thinking that there might be some other improvements!?
Thank you in advance!
Fab
How are you running the code and where is the output going? You’re not so much testing the performance of recursive functions as you are IO (the printlns) and compilation. Depending on how I measure the Julia code, I can get numbers anywhere from 0.013s (looking at @time towerOfHanoi(...) with output going to a file) to 2.8 seconds (looking at time julia test.jl, outputting to a fancy terminal).
As an aside, all those type annotations in the Julia code are superfluous and don’t impact performance.
This measures julia startup time (which is improved, but can be long) and compilation time in addition to the actual run time.
Here you must have already compiled hanoi.java with javac. To make a farier comparison with the julia measurement, you need to time both javac and java.
Most julia performance measurement is done using @time or @btime in the source code. @time on the first invocation will include compilation time. A second invocation will include only the runtime. @btime does repeated invocation automatically so that you can easily see the true runtime performance without compilation time.
yes, good comment about the compilation, this is what I get as quick test for javac
time javac hanoi.java
javac hanoi.java 0.87s user 0.08s system 176% cpu 0.538 total
I thought using time is for a quick test enough as it is easier to compare with java.
Below is a @btime output for n=35 and the code:
julia hanoi_java.jl
60.052 s (0 allocations: 0 bytes)
using BenchmarkTools
function towerOfHanoi(n::Int, source::String, target::String, helper::String)
if (n == 1)
# println("Move disk 1 from source $source to target $target ")
return
end
towerOfHanoi(n - 1, source, helper, target)
# println("Move disk $n from source $source to target $target ")
towerOfHanoi(n - 1, helper, target, source)
end
n::Int = 35 # number discs
@btime towerOfHanoi(n,"A"::String, "C"::String, "B"::String)
Probably there are better ways to benchmark this… Though, I understand, that there is for this case not so much to improve the actual code.
Alternatively try time java hanoi.java after removing the class files.
The point is that julia hanoi_java.jl includes a compilation step. If you would like to remove compilation from the benchmark, we can help you do that.
It’s perhaps also worth noting that without the prints, a sufficiently advanced compiler could choose to optimize everything away to simply return; in either language. Benchmarking is hard
I just noticed that you are using String in Julia and char in Java. Julia has Char and also has single quote syntax for them.
function towerOfHanoi(n::Int, source::Char, target::Char, helper::Char)
if (n == 1)
println("Move disk 1 from source $source to target $target ")
return
end
towerOfHanoi(n - 1, source, helper, target)
println("Move disk $n from source $source to target $target ")
towerOfHanoi(n - 1, helper, target, source)
end
n::Int = 15 # number discs
towerOfHanoi(n, 'A', 'C', 'B')
I’m not sure if that matters for performance or not yet.
To improve the accuracy of this benchmark, it’s a good idea to run Julia with julia --startup-file=no. This prevents Julia from running your config file, which may take substantial time.
In my case, for example:
0 [~] time julia -e "1+1"
real 0m0.299s
user 0m0.445s
sys 0m0.276s
0 [~] $ time julia --startup-file=no -e "1+1"
real 0m0.126s
user 0m0.077s
sys 0m0.055s
0 [~] $
You have to decide what you want to benchmark. Do you want to benchmark Julia startup time? Then time launching Julia how you normally do … your script doesn’t even need to do anything. Do you want to benchmark “recursive functions” in Julia? Then use BenchmarkTools.jl and put a @btime statement inside your script (so that it doesn’t include startup/compilation time).
The reason that we normally don’t include startup/compilation time in benchmarks is that synthetic benchmarks are usually toy versions of real-world problems, where in the real problem you care about performance because you are running similar calculations many more times. In such cases, the startup time is irrelevant (it doesn’t scale with the size of the problem).
Given that the OP reported that for n=35, julia was taking over a minute to run, and Java was 30 seconds, I seriously doubt this is explained by julia’s slow startup times.
That’s an interesting case. About half of the instructions generated by julia are push and pop. I don’t know what java generates. This could be some heuristic in LLVM’s register allocation which is unfortunate in this case. The function without IO doesn’t really do anything except a subtraction and the two calls, so saving a bunch of registers in every call seems ineffective.