I’m using Julia 1.9.3 on a Windows machine.

Two lines in my file ParallelWatson.jl:

N = 10^4

y = SharedArray{Float64}(N);

In *spite* of the semicolon, the full Shared Array is printed in the output file when I issue the command

julia -t 8 < ParallelWatson.jl > test4.out

I want to put N=10^9 and, well, no, I don’t want to see a billion zeros.

I see conflicting messages in the online discussions about semicolon suppressing output, some going back to 2014.

Can I suppress the output? If so, how? It’s got to be easy, and I must be doing something really dumb.

I should add that typing interactively in the REPL this does indeed suppress the printing of the Shared Array.

Perhaps add `nothing`

as last line of your file?

The problem is that you’re feeding the file’s contents through stdin using `<`

. Try dropping `<`

such that the filename is passed as an argument instead: `julia -t 8 ParallelWatson.jl > test4.out`

.

You shouldn’t need semicolons anywhere.

Thanks. That worked.

runit.jl (754 Bytes)

julia -t 8 runit.jl > runit.out

more runit.out

got (in the file runit.out)

0.015890 seconds (11.28 k allocations: 724.107 KiB, 49.86% compilation time)

0.15000009536743164 0.15000009536743164

Now I am going to go look up how I am *supposed* to be timing things. The 0.015890 seconds is too short (the program did NOT return in a hundredth of a second). I’m not even sure the .15 seconds is right. Still seems too short. But since I know nothing, I will go look and find; and if I run into trouble I shall open another issue.

Thanks!

This is a different topic, but:

Here’s how I would rewrite your script using threads instead of multiprocessing. Note that it’s important to put potentially performance-sensitive code in functions so it can be compiled, hence the function `F!(y)`

. Note the function `evalpoly`

for evaluating polynomials—no more writing out Horner’s method by hand.

```
function fx(x)
rho = inv(sqrt(x))
t1 = rho^2
coeffs = (
0.177245385090551603e1,
0.221556731363189504e0,
-0.969310699713954079e-1,
0.129818397283118850e0,
-0.297987312763625727e0,
0.975037584219099999e0,
-0.415203310223772901e1,
0.217970017974981910e2,
-0.136102906830731073e3,
0.985437438300633891e3,
)
return rho * evalpoly(t1, coeffs)
end
function F!(y)
Threads.@threads for i in 1:N
y[i] = fx(10.0 + 90.0 * (i - 1) / (N - 1))
end
return nothing
end
N = 10^9
y = Vector{Float64}(undef, N)
elapsed_time = @elapsed F!(y)
time_per_call = elapsed_time * 1.0e9 / N
@show (elapsed_time, time_per_call)
```

Output:

```
$ julia --threads=4 runit.jl
(elapsed_time, time_per_call) = (106.172722334, 106.172722334)
```

Thank you!! That’s wonderful, and I will try it on my machine.

Regarding evalpoly, I am happy to tell you that I did *not* write Horner’s method by hand; I generated the original expression in Maple (using a program that I wrote for Watson’s lemma for evaluation of an integral), asked Maple to convert to Horner form, and then asked Maple to generate the Julia code. CodeGeneration[Julia] in Maple is pretty basic, but it did save me the effort of writing Horner’s method by hand (and counting brackets)

1 Like

Thought it might be worth mentioning here that `evalpoly`

uses metaprogramming under the hood to generate unrolled code, so it’s fully equivalent to the written-out Horner expression. That is to say, it does not use a loop, in case you might be worried about the performance implications of that.

I *did* worry about that, and checked, and found that the evalpoly code was about ten percent *faster* than the original, which made me blink! Pretty impressive.

1 Like

The improvement is because `evalpoly`

uses the `muladd`

function instead of separate multiplication and addition operations. On capable CPUs, `muladd`

compiles to the `fma`

instruction, which performs `x * y + z`

in a single instruction and improves both performance and precision.

You can use `@code_typed`

to see the difference. It shows you what the function body looks like after lowering, inlining, and type inference:

```
julia> f_evalpoly(x) = evalpoly(x, (2.0, 2.0, 2.0, 2.0, 2.0))
f1 (generic function with 1 method)
julia> @code_typed f_evalpoly(1.0)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.muladd_float(x, 2.0, 2.0)::Float64
│ %2 = Base.muladd_float(x, %1, 2.0)::Float64
│ %3 = Base.muladd_float(x, %2, 2.0)::Float64
│ %4 = Base.muladd_float(x, %3, 2.0)::Float64
└── return %4
) => Float64
julia> f_horner(x) = 2.0 + x * (2.0 + x * (2.0 + x * (2.0 + x * 2.0)))
f2 (generic function with 1 method)
julia> @code_typed f_horner(1.0)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.mul_float(x, 2.0)::Float64
│ %2 = Base.add_float(2.0, %1)::Float64
│ %3 = Base.mul_float(x, %2)::Float64
│ %4 = Base.add_float(2.0, %3)::Float64
│ %5 = Base.mul_float(x, %4)::Float64
│ %6 = Base.add_float(2.0, %5)::Float64
│ %7 = Base.mul_float(x, %6)::Float64
│ %8 = Base.add_float(2.0, %7)::Float64
└── return %8
) => Float64
```