1.8 much slower than 1.6

@Impressium, you could also try the nightly 1.9-DEV. Of course we don’t want regressions, but if fixed there good to know. You might want to bisect or narrow down, e.g. test on 1.7 too (at least of not fixed on master).

Yes, and no. More designed for long-running (HPC type) code, but as you point out you can do scripts with that package, or simply also an option:

julia -O0

Have you tried that and do you see if 1.8 is faster (than 1.6) that way? Of course we don’t want a regression for any code, I suppose 1.8 is faster for most code.

It’s the first case. The scripts are running the whole day. No restarts

julia -O3
Means the fastest code and longest compile time?

1 Like

Yes, julia -O3 should have longest compilation time, i.e. higher number means more optimization, thus slower to compile (potentially faster at runtime). Of course worth to try that too, or lower -O1 or -O0, though I was answering for scripts (short running where you want to minimize optimization time), still good to try all possibilities for you. And:

 (@v1.8) pkg> status --outdated

and make sure you have same versions of your packages (and their dependencies). Julia tends to downgrade packages for me (or for people (like me) that misuse…).

1 Like

I assume you are multithreading? And also maybe calling Linear Algebra?

Possibly your BLAS is using more threads than before and this is interfering with your Julia threads? Try adjusting the number of blas threads with BLAS.set_num_threads

Try setting it to some number less than the number of physical cores.

It seems to me you need to use ProfileView.jl on a small chunk of your problem and see where the time is being spent. You could also compare that with 1.6. That should point you towards a fix.

Anything else is speculation as we cant read your code and the context seems quite complicated.

4 Likes

I don’t use LinearAlgebra.jl

How much slower is it?

It might be related to/same cause and solution:

I would try 1.9 as I mentioned (then you don’t need to compile our own 1.8.3) and 1.8.3 seems near, it’s getting very stable (1 regression), and might be close.

I’ve got one more observation.
I’m running multiple scripts on this server. One script is just running fine (1% CPU load). I started a second one and both of them use 100% split 50/50. I stop the second script and the first one goes back to 1%. I can do it again and the same happens

I tried 1.9. No difference

Maybe Julia 1.8 is somehow busy-waiting for some resource, while 1.6 was able to do it in a better way.

If it is the difference between 1 % and 100 %, then profiling should clearly show which function it is that is eating all the CPU cycles.

Multiple scripts, of I assume this script (though would be bad either way).

I assume you do:

$ julia -t auto

julia> Threads.nthreads()
16

so try halving:

$ julia -t 8

Then run just two scripts like this, or 4 halving yet again etc. The auto is there to use the highest number of practically usable threads. Maybe going over the edge for one (or many scripts) it’s going to be expected to get bad performance.

It’s possible though that it shouldn’t fall of a cliff, if it did that. Do you get better (or same) overall performance with some such settings? Why are you running many scripts (of the same kind?) since you’re using threads anyway? This seems to be a regression in Julia (and if not to be expected) you could file an issue.

I have recently encountered a significant performance hit when running my scripts at maximum threads count. Try reducing nthreads value as suggested above by Palli.

Are you perhaps using GLM? I have a discussion where I found excessive usage in julia recently (I don’t know when it started, but it wasn’t there when I originally developed the script) and it looked like GLM was using too many threads for regression. I guess it uses LinearAlgebra under the covers. I fixed it by reducing the BLAS thread count.

You can see the discussion here: Huge CPU load when using GLM

I found the solution.
I know we don’t use julia the right way. We run every script in the global environment :confused: (but we wanna change it soon)

BUT - the solution was:
I put

import Pkg

Pkg.activate()

at the very beginning of each script.

Thanks for all the help

Edit:
Not sure if the obove was the solution.
I just found out that JULIA_NUM_THREADS=4 was also needed.

I find this solution quite disturbing as it seems to imply that packages may interact in unknown ways to slow down the run.

3 Likes

Probably just different versions of packages being used.

1 Like

And would that justify a “much slower than”?

I guess those versions are slower than the other ones. Shouldn’t be too hard to verify, run the code with the same Project + Manifest on different julia versions and see if there is a perf difference.

But I’m very much interested why the change in my last edit works. I run 6 scripts in parallel and wanna give them as much CPU as possible. Why do I need to restricht it to less than the system can offer?