Understanding what -t does

fa-bien · January 14, 2021, 11:33am

I am running a single-threaded program and it executes ~10% faster when I run it with -t 2 compared to when I run with -t 1, i.e. when I allow two threads instead of one. The program does not explicitly refer to anything in Base.Threads. Looking at CPU usage, it seems that only one core is used in both cases. What could explain the 10% difference in performance? Does using “-t 2” mean that a second core is used by Julia for other small tasks that I did not notice? It is important for me to restrict all computation to one core.

Henrique_Becker · January 14, 2021, 12:40pm

Hard to say, but maybe something you use (from Base or other libraries) look at the number of threads and make use of it. How long the program usually takes? (If it is very little time it can be noise.) Can you keep looking at a fast-refreshing usage monitor during the duration (or save it to a log)? (To assure if it really does not spawn a second thread.)

stevengj · January 14, 2021, 2:13pm

How are you measuring performance? Are you including the time to launch julia?

fa-bien · January 14, 2021, 2:38pm

I use CPUTime and the @CPUelapsed macro to measure the call that I use as benchmark. This measured call is single-threaded. So time to launch is not included. Time to compile is not an issue: repeated calls in the same session all lead to the same observation.

jling · January 14, 2021, 3:08pm

can you use @btime and friends to measure the main function within your script?

fa-bien · January 14, 2021, 3:30pm

My understanding is that @btime measures wall time, not CPU clock time, therefore does not really measure the CPU budget used by a program, rather how long it took to run that program given the other programs competing with it for resources. Correct me if I am wrong. I want to measure the CPU budget used by a program (I do not understand why wall time would be used for benchmarking but that is another discussion).

jling · January 14, 2021, 3:39pm

because things like OpenBLAS can internally multi-thread even if your Julia is single thread (-t 1) making CPU time appears many folds longer (summed across threads)

fa-bien · January 14, 2021, 3:58pm

Thanks, that is the kind of things I am wondering about. Looking at my program I’m not sure where there could be multi-threading. Is there any documentation on how to track that? More generally, is there no way at all to tell Julia to stick to one thread?

If two cores are used at 100% for 1s then I want to measure 2s, I believe @CPUelapsed measures that correctly for me. Again feel free to correct me.

I used @btime as you suggested, 3 times for each setting. With one thread the times are 1.695, 1.697 and 1.691. With two threads the times are 1.676, 1.676 and 1.672. Not quite 10% but still a small difference.

giordano · January 14, 2021, 4:10pm

Yes, start Julia with a single thread. But external libraries may still spawn multiple threads, which is the case for OpenBLAS for example, as already said above. Without seeing the code, it’s hard to guess what’s happening.

fa-bien · January 14, 2021, 4:43pm

Thanks. Here is the code, it is a basic implementation of the Edmonds-Karp algorithm for solving the maximum flow problem. However, even if there are multiple threads, shouldn’t CPUTime report the same numbers since the total CPU effort is the same? Actually I would expect at least as much with two threads as with one.

function edmondskarp(C::MT, F::MT, n::Int, s::Int, t::Int) where MT <: Matrix
    totalflow = zero(Float64)
    moreflow = true
    Q = Deque{Int}()
    pred = [ -1 for i ∈ 1:n ]
    for i ∈ 1:n, j ∈ 1:n
        F[i,j] = zero(Float64)
    end
    while moreflow
        # reset predecessors
        for i ∈ 1:n
            pred[i] = -1
        end
        push!(Q, s)
        while ! isempty(Q)
            cur = popfirst!(Q)
            for j in 1:n
                j == cur && continue
                if pred[j] == -1 && j ≠ s && C[cur, j] > F[cur, j]
                    pred[j] = cur
                    push!(Q, j)
                end
            end
        end
        # did we find an augmenting path?
        if pred[t] ≠ -1
            df = typemax(Float64)
            i, j = pred[t], t
            while i ≠ -1
                if df > C[i,j] - F[i,j]
                    df = C[i,j] - F[i,j]
                end
                i, j = pred[i], i
            end
            i, j = pred[t], t
            while i ≠ -1
                F[i,j] += df
                i, j = pred[i], i
            end
            totalflow += df
        else
            moreflow = false
        end
    end
    totalflow
end

lmiq · January 14, 2021, 6:24pm

It should be good practice to not do that, wright? The library should take the number of threads defined to Julia and use that number of threads at most. Of course that is on the hands of the library developer and cannot be enforced (I guess) if foreign code is used.

giordano · January 14, 2021, 6:28pm

I’m not sure we’re talking about the same thing. I was referring to external shared binary libraries, like OpenBLAS, which are completely independent from Julia’s internal threading model.

mauro3 · January 14, 2021, 6:31pm

So, how does one set the threads on OpenBLAS? And in general, the total number of threads my Julia code uses? As OP, I sometimes run things which I need to ensure that they only run on n threads (say on a shared server)

jling · January 14, 2021, 6:33pm

specifically for OpenBLAS I think:

LinearAlgebra.BLAS.set_num_threads(4)

?

lmiq · January 14, 2021, 6:49pm

Yes, yes. But there is a Julia front-end to it, which defines (in this case) or should be able to define the number of threads used by OpenBLAS. That parameter should be set to the number of threads of Julia as default, in my opinion (in every package that defines such an interface).

giordano · January 14, 2021, 7:13pm

In general external libraries are free to do whatever they want and Julia has no control over them. In this specific case, OpenBLAS happens to let you control the number of threads to use and Julia has an interface to that.

lmiq · January 14, 2021, 7:38pm

Sure, what I am saying is that most of the times, if not always, multi-threaded packages have some parameter that sets up the number of threads to be used. And that Julia interfaces to those package should be written in such a way that that parameter should be set to the number of threads available to Julia. As I mentioned, you cannot enforce that from Julia, but it would be a good practice to develop interfaces with that in mind.

fa-bien · January 15, 2021, 9:56am

Back to my original question, how can -t 2 lead to lower CPU time (as measured using CPUTime) than -t 1? Shouldn’t it be at least as much? I feel I am missing something here.

mikkoku · January 15, 2021, 11:43am

It would be helpful to provide a MWE. I tried your function with random matrices but could not observe anything.

Topic		Replies	Views
Parallel computing with * Performance question	27	1097	December 29, 2022
Using and understanding multi-threading Performance multithreading	1	283	January 11, 2024
Benchmarking with @time @btime and subsequent runs return shorter execution time New to Julia benchmark , benchmarktools	8	2117	December 31, 2020
@time vs @btime Performance	5	17457	March 22, 2018
Same multi-threaded code, scaling observed only on some machines Performance	2	67	August 14, 2024

Understanding what -t does

Related topics