I’m using Julia 1.9.3 on a Windows machine.
Two lines in my file ParallelWatson.jl:
N = 10^4
y = SharedArray{Float64}(N);
In spite of the semicolon, the full Shared Array is printed in the output file when I issue the command
julia -t 8 < ParallelWatson.jl > test4.out
I want to put N=10^9 and, well, no, I don’t want to see a billion zeros.
I see conflicting messages in the online discussions about semicolon suppressing output, some going back to 2014.
Can I suppress the output? If so, how? It’s got to be easy, and I must be doing something really dumb.
I should add that typing interactively in the REPL this does indeed suppress the printing of the Shared Array.
Perhaps add nothing
as last line of your file?
The problem is that you’re feeding the file’s contents through stdin using <
. Try dropping <
such that the filename is passed as an argument instead: julia -t 8 ParallelWatson.jl > test4.out
.
You shouldn’t need semicolons anywhere.
Thanks. That worked.
runit.jl (754 Bytes)
julia -t 8 runit.jl > runit.out
more runit.out
got (in the file runit.out)
0.015890 seconds (11.28 k allocations: 724.107 KiB, 49.86% compilation time)
0.15000009536743164 0.15000009536743164
Now I am going to go look up how I am supposed to be timing things. The 0.015890 seconds is too short (the program did NOT return in a hundredth of a second). I’m not even sure the .15 seconds is right. Still seems too short. But since I know nothing, I will go look and find; and if I run into trouble I shall open another issue.
Thanks!
This is a different topic, but:
Here’s how I would rewrite your script using threads instead of multiprocessing. Note that it’s important to put potentially performance-sensitive code in functions so it can be compiled, hence the function F!(y)
. Note the function evalpoly
for evaluating polynomials—no more writing out Horner’s method by hand.
function fx(x)
rho = inv(sqrt(x))
t1 = rho^2
coeffs = (
0.177245385090551603e1,
0.221556731363189504e0,
-0.969310699713954079e-1,
0.129818397283118850e0,
-0.297987312763625727e0,
0.975037584219099999e0,
-0.415203310223772901e1,
0.217970017974981910e2,
-0.136102906830731073e3,
0.985437438300633891e3,
)
return rho * evalpoly(t1, coeffs)
end
function F!(y)
Threads.@threads for i in 1:N
y[i] = fx(10.0 + 90.0 * (i - 1) / (N - 1))
end
return nothing
end
N = 10^9
y = Vector{Float64}(undef, N)
elapsed_time = @elapsed F!(y)
time_per_call = elapsed_time * 1.0e9 / N
@show (elapsed_time, time_per_call)
Output:
$ julia --threads=4 runit.jl
(elapsed_time, time_per_call) = (106.172722334, 106.172722334)
Thank you!! That’s wonderful, and I will try it on my machine.
Regarding evalpoly, I am happy to tell you that I did not write Horner’s method by hand; I generated the original expression in Maple (using a program that I wrote for Watson’s lemma for evaluation of an integral), asked Maple to convert to Horner form, and then asked Maple to generate the Julia code. CodeGeneration[Julia] in Maple is pretty basic, but it did save me the effort of writing Horner’s method by hand (and counting brackets)
1 Like
Thought it might be worth mentioning here that evalpoly
uses metaprogramming under the hood to generate unrolled code, so it’s fully equivalent to the written-out Horner expression. That is to say, it does not use a loop, in case you might be worried about the performance implications of that.
I did worry about that, and checked, and found that the evalpoly code was about ten percent faster than the original, which made me blink! Pretty impressive.
1 Like
The improvement is because evalpoly
uses the muladd
function instead of separate multiplication and addition operations. On capable CPUs, muladd
compiles to the fma
instruction, which performs x * y + z
in a single instruction and improves both performance and precision.
You can use @code_typed
to see the difference. It shows you what the function body looks like after lowering, inlining, and type inference:
julia> f_evalpoly(x) = evalpoly(x, (2.0, 2.0, 2.0, 2.0, 2.0))
f1 (generic function with 1 method)
julia> @code_typed f_evalpoly(1.0)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.muladd_float(x, 2.0, 2.0)::Float64
│ %2 = Base.muladd_float(x, %1, 2.0)::Float64
│ %3 = Base.muladd_float(x, %2, 2.0)::Float64
│ %4 = Base.muladd_float(x, %3, 2.0)::Float64
└── return %4
) => Float64
julia> f_horner(x) = 2.0 + x * (2.0 + x * (2.0 + x * (2.0 + x * 2.0)))
f2 (generic function with 1 method)
julia> @code_typed f_horner(1.0)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.mul_float(x, 2.0)::Float64
│ %2 = Base.add_float(2.0, %1)::Float64
│ %3 = Base.mul_float(x, %2)::Float64
│ %4 = Base.add_float(2.0, %3)::Float64
│ %5 = Base.mul_float(x, %4)::Float64
│ %6 = Base.add_float(2.0, %5)::Float64
│ %7 = Base.mul_float(x, %6)::Float64
│ %8 = Base.add_float(2.0, %7)::Float64
└── return %8
) => Float64