I can't suppress printing of a SharedArray; I must be doing something wrong. Help?

I’m using Julia 1.9.3 on a Windows machine.

Two lines in my file ParallelWatson.jl:

N = 10^4
y = SharedArray{Float64}(N);

In spite of the semicolon, the full Shared Array is printed in the output file when I issue the command

julia -t 8 < ParallelWatson.jl > test4.out

I want to put N=10^9 and, well, no, I don’t want to see a billion zeros.

I see conflicting messages in the online discussions about semicolon suppressing output, some going back to 2014.

Can I suppress the output? If so, how? It’s got to be easy, and I must be doing something really dumb.

I should add that typing interactively in the REPL this does indeed suppress the printing of the Shared Array.

Perhaps add nothing as last line of your file?

The problem is that you’re feeding the file’s contents through stdin using <. Try dropping < such that the filename is passed as an argument instead: julia -t 8 ParallelWatson.jl > test4.out.

You shouldn’t need semicolons anywhere.

Thanks. That worked.
runit.jl (754 Bytes)

julia -t 8 runit.jl > runit.out

more runit.out

got (in the file runit.out)

0.015890 seconds (11.28 k allocations: 724.107 KiB, 49.86% compilation time)
0.15000009536743164 0.15000009536743164

Now I am going to go look up how I am supposed to be timing things. The 0.015890 seconds is too short (the program did NOT return in a hundredth of a second). I’m not even sure the .15 seconds is right. Still seems too short. But since I know nothing, I will go look and find; and if I run into trouble I shall open another issue.

Thanks!

This is a different topic, but:

  • When you want to assign the elapsed time to a variable, rather than print it, you can use @elapsed instead of @time, like this:
    elapsed_time = @elapsed begin
        # code
    end
    
  • @distributed runs asynchronously when used this way (see Distributed Computing · The Julia Language). This is probably the source of your unrealistic timings. You need to wrap @sync around the loop:
    @sync @distributed for ...
        # code
    end
    
  • @distributed does multiprocessing, not multithreading, so you should start julia with multiple processes, not multiple threads: julia --procs=8
    • Or you might want to use multithreading instead, by replacing @distributed with Threads.@threads and replacing the SharedArray with a regular Vector
  • If you stick to multiprocessing you’ll need to annotate your function definition with @everywhere for it to be available on all workers. That is, replace function Fx(...) with @everywhere function Fx(...). This is not necessary if you decide to use threads instead of Distributed.

Here’s how I would rewrite your script using threads instead of multiprocessing. Note that it’s important to put potentially performance-sensitive code in functions so it can be compiled, hence the function F!(y). Note the function evalpoly for evaluating polynomials—no more writing out Horner’s method by hand.

function fx(x)
    rho = inv(sqrt(x))
    t1 = rho^2
    coeffs = (
        0.177245385090551603e1,
        0.221556731363189504e0,
        -0.969310699713954079e-1,
        0.129818397283118850e0,
        -0.297987312763625727e0,
        0.975037584219099999e0,
        -0.415203310223772901e1,
        0.217970017974981910e2,
        -0.136102906830731073e3,
        0.985437438300633891e3,
    )
    return rho * evalpoly(t1, coeffs)
end

function F!(y)
    Threads.@threads for i in 1:N
        y[i] = fx(10.0 + 90.0 * (i - 1) / (N - 1))
    end
    return nothing
end

N = 10^9
y = Vector{Float64}(undef, N)

elapsed_time = @elapsed F!(y)

time_per_call = elapsed_time * 1.0e9 / N

@show (elapsed_time, time_per_call)

Output:

$ julia --threads=4 runit.jl
(elapsed_time, time_per_call) = (106.172722334, 106.172722334)

Thank you!! That’s wonderful, and I will try it on my machine.

Regarding evalpoly, I am happy to tell you that I did not write Horner’s method by hand; I generated the original expression in Maple (using a program that I wrote for Watson’s lemma for evaluation of an integral), asked Maple to convert to Horner form, and then asked Maple to generate the Julia code. CodeGeneration[Julia] in Maple is pretty basic, but it did save me the effort of writing Horner’s method by hand (and counting brackets) :slight_smile:

1 Like

Thought it might be worth mentioning here that evalpoly uses metaprogramming under the hood to generate unrolled code, so it’s fully equivalent to the written-out Horner expression. That is to say, it does not use a loop, in case you might be worried about the performance implications of that.

I did worry about that, and checked, and found that the evalpoly code was about ten percent faster than the original, which made me blink! Pretty impressive.

1 Like

The improvement is because evalpoly uses the muladd function instead of separate multiplication and addition operations. On capable CPUs, muladd compiles to the fma instruction, which performs x * y + z in a single instruction and improves both performance and precision.

You can use @code_typed to see the difference. It shows you what the function body looks like after lowering, inlining, and type inference:

julia> f_evalpoly(x) = evalpoly(x, (2.0, 2.0, 2.0, 2.0, 2.0))
f1 (generic function with 1 method)

julia> @code_typed f_evalpoly(1.0)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.muladd_float(x, 2.0, 2.0)::Float64
│   %2 = Base.muladd_float(x, %1, 2.0)::Float64
│   %3 = Base.muladd_float(x, %2, 2.0)::Float64
│   %4 = Base.muladd_float(x, %3, 2.0)::Float64
└──      return %4
) => Float64

julia> f_horner(x) = 2.0 + x * (2.0 + x * (2.0 + x * (2.0 + x * 2.0)))
f2 (generic function with 1 method)

julia> @code_typed f_horner(1.0)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.mul_float(x, 2.0)::Float64
│   %2 = Base.add_float(2.0, %1)::Float64
│   %3 = Base.mul_float(x, %2)::Float64
│   %4 = Base.add_float(2.0, %3)::Float64
│   %5 = Base.mul_float(x, %4)::Float64
│   %6 = Base.add_float(2.0, %5)::Float64
│   %7 = Base.mul_float(x, %6)::Float64
│   %8 = Base.add_float(2.0, %7)::Float64
└──      return %8
) => Float64

Thank you! This is nice!