How to use my CPU's

Ross_Boylan · September 17, 2022, 1:01am

I instantiated an environment and watched for 533 seconds as it compiled over 100 packages. During this time julia cpu useage maxed at 12%, i.e., one of the 8 cores I have.

What is the best way to use more of my cores? I am running julia 1.8.1 under Linux and VSCode (though partly because it was slow in a similar way under MS-Windows). I set the option -p 4 for julia and restarted. This kinda worked, but I got this error (among many similar ones):

│    1-element ExceptionStack:
│    On worker 2:
│    ArgumentError: Package MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316] is required but does not seem to be installed:

I presume the problem is that the workers started at some point before the environment I use (which has the “missing” package) was activated. But I’m not sure how to fix that.

So

Should I be using threads (-t) or processes (-p)? The manual’s introduction to parallel computing says threads are usually easiest on a single PC, which is my situation.
Do I have to set those options when I invoke julia?
How do I get my threads or processes to work in the same environment?

The manual is mostly oriented to programming (though command-line options do get a brief description), but here I’m looking to speed up stuff the system is already doing for me.

mkitti · September 17, 2022, 1:36am

Use threads
You have to set them when you invoke julia via the command line. If you are using Windows, you need to add them to the shortcut. Otherwise use a terminal.
With threads, everything should be using the same environment automatically. With processes, you can use @everywhere to activate an environment on all processes.

Also see the environment variables JULIA_NUM_THREADS and JULIA_NUM_PRECOMPILE_TASKS.

https://docs.julialang.org/en/v1/manual/environment-variables/#JULIA_NUM_THREADS

https://pkgdocs.julialang.org/v1/api/#Pkg.precompile

Ross_Boylan · September 17, 2022, 6:36am

I tried those things, but mostly they didn’t seem to help.

To simplify, I switched to executing julia from bash, removing VSCode and its debugger/language server as possible culprits. I even tried quitting VSCode completely in case it was writing to or locking files.

I do start Revise in my startup file.

Good News

Using threads produced no errors about unrecognized packages. Though it also produced no sign more than one thread was active.
Using @everywhere (thanks for the tip) I was able to get all the processes in the same environment using

@everywhere import Pkg
@everywhere Pkg.activate(".")

Unchanged Bad News

Cloning the project into a fresh directory and starting with julia -t4 didn’t seem to induce much parallelism, increase in CPU use beyond 1, or speedup.

(@v1.8) pkg> activate .
  Activating project at `~/Documents/BP-2`

julia> import Pkg

julia> @time Pkg.precompile()
┌ Warning: The active manifest file is missing a julia version entry. Dependencies may have been resolved with a different julia version.
└ @ ~/Documents/BP-2/Manifest.toml:0
Precompiling project...
  1 dependency successfully precompiled in 408 seconds. 123 already precompiled.
410.341454 seconds (2.77 M allocations: 177.222 MiB, 0.04% gc time, 0.16% compilation time)

So one package was precompiled but it took 410 seconds!? Maybe the work is hidden in the threads?

Since packages are managed as a shared pool (I think, even if one is using a custom environment), it’s unsurprising that the work was mostly done–I had already used the packages on the same machine. But if the work really was all done, why did it take so long?

Starting julia with threads produced no speedup and no increase in CPU use beyond 12% when include’ing my test file, which defines one type, one function, and calls it. The function generates 12 rows of random data, but takes over a minute:

@time include("src/maker.jl")
# small table output omitted
73.580062 seconds (33.69 M allocations: 2.036 GiB, 4.34% gc time, 42.84% compilation time: 22% of which was recompilation)

# but if I repeat the only action line that is not printout
@time data = maker()
  0.000539 seconds (153 allocations: 9.477 KiB)

While this is not the same as what happens when I debug in VSCode, the 75s delay seems similar. And it happens every time I debug, even if I don’t change the code. It also happens when I run the code without debugging in VSCode, just as it happens every time I start the REPL and include the file. The time clearly isn’t going into actually executing the maker() function.

The results of @time with -t4 are about the same as when julia starts with no options.

It seems odd that julia is spending anytime compiling code that has already been compiled, and it also raises the question of what the 53% of the time that is not compilation or gc is going for.

It’s hard to develop code when every debugging cycle takes > a minute to get started.

Using julia -p4 and the @everywhere code above did produce activity in more than one CPU for a little while at the start. But then it seemed to go back to one core, and the whole thing was even slower than the version with threads:

125.444737 seconds (34.02 M allocations: 2.052 GiB, 2.46% gc time, 26.13% compilation time: 25% of which was recompilation)

I’m not sure if these times, like 125 seconds just above, are wall-clock times, sum of CPU time for all threads/processes, or something else–the help isn’t clear to me. But, as I sat before the terminal, they seemed plausible as wall-clock times.

Small code, big dependencies?

The list of packages explicitly added to the environment is not large:

(MSEP) pkg> st
Project MSEP v0.1.0
Status `~/Documents/BP-MSEP/Project.toml`
  [a93c6f00] DataFrames v1.3.5
  [31c24e10] Distributions v0.25.71
  [442a2c76] FastGaussQuadrature v0.4.9
  [c91e804a] Gadfly v1.3.4
  [ff71e718] MixedModels v4.7.1 `~/.julia/dev/MixedModels`
  [86f7a689] NamedArrays v0.9.6
  [1fd47b50] QuadGK v2.5.0
  [4c63d2b9] StatsFuns v1.0.1
  [37e2e46d] LinearAlgebra

However, there are a lot of packages, ~120, when dependencies are included.

The test file only uses some of them:

using DataFrames
using Distributions
using MixedModels
using StatsFuns

jmair · September 17, 2022, 8:48am

Compilation is not a process that is easily sped up by using more cores. There’s still a “critical path” of compilation that needs to be done sequentially which will be your floor on time. Also, having to schedule work amongst the cores has overhead and for tasks that don’t make much use of parallelism, this overhead can actually cause a slowdown.

But in general, multithreading is the way forward if you are on a single machine, since all “workers”/(i.e. the cores) share the same memory and you need only load the packages once and they are available everywhere. In multiprocessing, each worker has to load all the packages it needs, along with compiling their own functions, which uses more memory and the communication overhead is much higher. As a side tip, you can add processes with addprocs(8, exeflags=["--project"]) to start workers with the current environment. Sometimes this may error later on when you use a package, but you just need to instantiate the environment first (Pkg.instantiate()) and try again.

You can set the number of threads automatically by setting the environment variable JULIA_NUM_THREADS as suggested earlier, which you can check by running Threads.nthreads().

Optimising your workflow can be a bit tricky, but the key is to have an open REPL that you use for most of your work to debug your code in there. Using the inbuilt REPL for the VS code extension will have revise and let you redefine your functions etc and then run again to see if they work. If you need to debug, you can use @enter to start debugging - there is a panel to specify compiled modules when debugging, it is best to list your big packages here. For example I add " CUDA." which has everything from the CUDA package compiled when I am debugging, which makes it useable. You should only need to do this once in your project, and it just remembers.

Precompilation doesn’t compile everything, but I believe that you can use sysimages to snapshot your current setup and avoid having to wait for all of the packages to load every time - (Compiling Sysimages · Julia in VS Code). I have never had to use them myself, but from what I hear, this will almost completely eliminate that load time.

mkitti · September 17, 2022, 4:46pm

What is the one package that is taking so long to compile? Is it the package you are developing?

The precompilation task is distributed by package, so if a single package is taking so long to precompile no amount of multithreading will help.

Thus for the precompilation problem we need to figure out why that one package is taking so long.

Next, I see you are trying to include a src/maker.jl from the REPL. If you want to use Revise wtih a script, you need to use includet:
https://timholy.github.io/Revise.jl/stable/user_reference/#Revise.includet

Try this first. To be clear the process is

includet your script.
Try to execute your code
Modify your script
Try to execute your code

Notice that you do not need to include again to reload it.

The second question is why are using a script rather than creating a package? The package is the basic unit of cached compilation. If there is no package, then no precompilation can be retained between Julia sessions.

Since you already have an environment, just add a name and UUID to to your Project.toml. Rename the project and folder to BP2 them create a src/BP2.jl containing a module BP2 end that includes your code, maker.jl within the module.

You can generate a UUID via

julia> using UUIDs

julia> uuid4()
UUID("c3824cc7-012c-4100-9d93-88c7b2cbf3f1")

Ross_Boylan · September 17, 2022, 6:53pm

Surely if I have > 100 packages, they can be compiled in parallel? Although, as @mkitti observes, if one of them takes a long time to compile, as appears to be the case, that will limit the gains.

That was one reason I reported on both the timing of Pkg.precompile() in point 1 earlier and the timing of include in point 2. The former appears to come with a promise of parallelism from the documentation, while the latter doesn’t AFAIK. I can still hope that if the latter triggers compilation of lots of packages, it will happen in parallel.

Also, my thinking about compilation isn’t quite right for julia. When the interpreter encounters some code defining types or functions, it usually doesn’t know what types it will be called with. This is probably why the messages refer to precompilation. The concrete types for which the code needs to be compiled usually only are available once the program starts running. I hope the results of compilation with specific types is cached, but I don’t know that.

Ross_Boylan · September 17, 2022, 7:14pm

Before I pursue the helpful suggestsion from @jmair and @mkitti I want to report some results.

Removing using MixedModels from my test file produced a big speedup: time for the initial @include fell from 74s to 33s. The test file isn’t using anything from MixedModels yet (which also means, by the logic of my previous post, that there should be no need to compile MixedModels with any concrete types–and yet omitting it clearly matters).

There are 2 distinctive things about MixedModels: first, it is the only package that is dev’d; second, it has a lot of dependencies (I assume–unverified). I’m not sure which of those 2 is more important.

Second, if I execute include a second time within the same session there is very little overhead. My previous reports that later include’s were as slow were for situations with a new julia session for each invocation.

# using MixedModels omited from maker.jl
julia> @time include("src/maker.jl")
# output omitted
 32.965605 seconds (13.94 M allocations: 807.994 MiB, 3.85% gc time, 66.60% compilation time: 6% of which was recompilation)

julia> @time include("src/maker.jl")
 0.256567 seconds (210.12 k allocations: 11.082 MiB, 86.48% compilation time)

Yes, I have wandered from my original question about using the CPU’s I remain interested in that too.

Ross_Boylan · September 17, 2022, 11:43pm

Summary: the fact that MixedModels was dev’d did not contribute to the compilation time, and I could use some help figuring out which packages are precompiled in the precompilation step.

The dev doesn’t seem to matter. After ] free MixedModels and another very long precompilation,

2 dependencies successfully precompiled in 507 seconds. 122 already precompiled.

the time to include in a new julia session remained 73-74s after restoring using MixedModels.

How do I find out? I don’t see any mention of log files in the Pkg manual, nor do I find anything useful hunting around the disk. FWIW the free showed 2 packages precompiling, MSEP and MixedModels, and the amount of time it displayed each during the process seemed roughly equal. I think it showed MSEP during the previous example that said only one file was actively precompiled, but I’m not sure. I also wasn’t sure if that was really what it was compiling, or if it was just displaying that as the package name.

Ross_Boylan · September 22, 2022, 1:21am

I think I have pursued most of the suggestions from @jmair and @mkitti, but the slowness persists. In particular, when debugging in VSCode it takes 60-70 seconds from when I start the debugging to arrive at the first breakpoint in the program. And if I start another debug right after that, without touching any code, it takes that long again.

All runs had no explicit specification of number of threads or processes, and seem limited to one core.

Put the code inside a Package

I already had a package and had just not put this code inside of it. I did so by moving the using directives to MSEP.jl and also put `include(“maker.jl”) in that file. The little test stub is outside of it:

using MSEP
data = maker()
print(data)

If I include that in the REPL (with the MSEP environment active) it takes ~36 seconds the first time and ~1s later times in the same session. So that may be a speedup from getting the definitions into a package. However, if I start a new julia session (in a terminal, not VSCode) the first load still takes ~36s.

For the VSCode debugger I put a breakpoint on the data=maker() line; all references to “time to first breakpoint” are referring to that line.

VSCode Compile Settings

I added all the packages that MSEP.jl said it was using to the Compiled list in the “Julia: Compiled Code” pane of the debugger. This was an odd process: I would hit the plus sign; a popup would take my input but it had no autocomplete. Also, when hovering on the upper right of the compiled code window a red dot became visible, with accompany text “Julia: Enable Compiled Mode for the debugger”. Whether this meant it was enabled or needed to be enabled was unclear. I tried clicking on the dot and/or executing the “Enable Compiled Mode” command from the command popup. Neither did anything as far as I could tell, and none of it seemed to affect the time it took to get to the first breakpoint.

Also, when I exited and restarted VSCode all the additions I had made to the Compiled Code list were gone. I thought they were supposed to persist.

Build Sysimage

I think I did this. My initial attempt failed with an error that RelocatableFolders could not be found. I added it to my environment, and got the same error. I added it to the 1.8 environment and was able to build. 25 minutes later (!) it finished. Then I checked the “use custom sysimage” box in settings.

The only difference I see is that I now get errors when starting julia in VSCode:

ERROR: LoadError: IOError: connect: no such file or directory (ENOENT)
Stacktrace:
 [1] wait_connected(x::Base.PipeEndpoint)
   @ Sockets /usr/local/julia-1.8.1/share/julia/stdlib/v1.8/Sockets/src/Sockets.jl:529
 [2] connect
   @ /usr/local/julia-1.8.1/share/julia/stdlib/v1.8/Sockets/src/Sockets.jl:564 [inlined]
 [3] connect
   @ /usr/local/julia-1.8.1/share/julia/stdlib/v1.8/Sockets/src/PipeServer.jl:103 [inlined]
 [4] serve(args::String; is_dev::Bool, crashreporting_pipename::String)
   @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.7.12/scripts/packages/VSCodeServer/src/VSCodeServer.jl:110
 [5] top-level scope
   @ ~/.vscode/extensions/julialang.language-julia-1.7.12/scripts/terminalserver/terminalserver.jl:39
in expression starting at /home/ross/.vscode/extensions/julialang.language-julia-1.7.12/scripts/terminalserver/terminalserver.jl:14

But I can still run and debug the code, at the same speed as before.

The option to use custom sysimages refers specifically to the REPL, and so perhaps is irrelevant when debugging code through the GUI. Or maybe they are not used for debugging at all.

Misc

The non-results from adding modules to the “compiled code” pane and from system images do not indicate those make no difference. The compiled code settings may speed up debugging when stepping through code. It’s just that I was analysing how long it took to load the debugger and get to the first line that did anything other than define functions or structs.

From when I hit “Run and debug” to when the terminal showed Connecting to debugger ... Done! was typically 15-20s. The 60-70s to get to the first breakpoint includes that time and waiting after.

The times I reported at the top were after applying all the optimization (attempts) above.

I tried includet a few times, but it seemed to perform the same and didn’t really fit my use case. I wasn’t looking to revise my test code, just reexecute it.

mkitti · September 22, 2022, 7:01am

Use @time_imports.

It sounds like you would want to do

@time_imports using MSEP

Here’s what I get when I do @time_imports using DataFrames, Distributions, FastGaussQuadrature, Gadfly, MixedModels, NamedArrays, QuadGK, StatsFuns, LinearAlgebra

julia> @time_imports using DataFrames, Distributions, FastGaussQuadrature, Gadfly, MixedModels, NamedArrays, QuadGK, StatsFuns, LinearAlgebra
      0.4 ms  Reexport
      0.5 ms  Compat
     15.6 ms  OrderedCollections
    112.8 ms  DataStructures
      0.5 ms  SortingAlgorithms
      1.6 ms  DataAPI
     30.0 ms  PooledArrays
     13.7 ms  Missings
      3.8 ms  InvertedIndices
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  TableTraits
      0.8 ms  Formatting
      0.1 ms  DataValueInterfaces
     22.6 ms  Tables
    110.8 ms  Crayons
    258.0 ms  PrettyTables
   2382.4 ms  DataFrames
      5.1 ms  DocStringExtensions 54.96% compilation time
    133.9 ms  ChainRulesCore
      1.0 ms  ChangesOfVariables
      1.1 ms  InverseFunctions
     10.7 ms  IrrationalConstants
      1.1 ms  LogExpFunctions
      0.4 ms  StatsAPI
     36.8 ms  StatsBase
     40.3 ms  PDMats
      0.2 ms  OpenLibm_jll
     30.0 ms  Preferences
      0.5 ms  JLLWrappers
      0.3 ms  CompilerSupportLibraries_jll
      4.0 ms  OpenSpecFun_jll 82.06% compilation time
     26.8 ms  SpecialFunctions
      0.6 ms  Rmath_jll
    258.8 ms  Rmath 39.58% compilation time
      0.5 ms  NaNMath
      3.2 ms  Calculus
     64.8 ms  DualNumbers
      1.4 ms  HypergeometricFunctions
      9.0 ms  StatsFuns
      4.4 ms  QuadGK
    356.4 ms  FillArrays
      2.2 ms  DensityInterface
    415.0 ms  Distributions
      4.2 ms  StaticArraysCore
   1736.3 ms  StaticArrays
      0.9 ms  FastGaussQuadrature
    223.3 ms  FixedPointNumbers
    154.2 ms  ColorTypes 5.07% compilation time
    947.5 ms  Colors
     59.7 ms  IterTools
     11.8 ms  Measures
      0.3 ms  Requires
    284.6 ms  Parsers 3.43% compilation time
     41.2 ms  JSON
     85.0 ms  Compose 22.12% compilation time (9% recompilation)
      0.2 ms  Showoff
      5.2 ms  IndirectArrays
   2709.2 ms  CategoricalArrays 91.78% compilation time (96% recompilation)
      4.9 ms  Hexagons
      3.0 ms  Contour
      8.8 ms  Distances
      1.8 ms  Loess
      3.4 ms  CoupledFields
     14.2 ms  WoodburyMatrices
     59.8 ms  Ratios 51.20% compilation time
      0.3 ms  AxisAlgorithms
      0.3 ms  Adapt
    272.2 ms  OffsetArrays
     79.0 ms  Interpolations 14.21% compilation time
     19.1 ms  AbstractFFTs
    474.6 ms  FFTW_jll 99.76% compilation time (100% recompilation)
    819.9 ms  FFTW 4.52% compilation time
      3.7 ms  KernelDensity
    570.2 ms  Gadfly 42.64% compilation time (16% recompilation)
    558.5 ms  SentinelArrays 19.96% compilation time
    376.6 ms  Lz4_jll 99.80% compilation time (100% recompilation)
      4.0 ms  TranscodingStreams
     15.1 ms  CodecLz4
      0.6 ms  Zstd_jll
      9.3 ms  CEnum
     12.9 ms  CodecZstd
      0.3 ms  Scratch
    251.3 ms  RecipesBase
     29.3 ms  InlineStrings
      0.4 ms  ExprTools
      2.1 ms  Mocking
    901.8 ms  TimeZones 52.72% compilation time
     59.9 ms  BitIntegers
      5.6 ms  ArrowTypes
    353.1 ms  Arrow 69.39% compilation time
      4.7 ms  ShiftedArrays
     18.7 ms  StatsModels
      8.4 ms  GLM
     21.6 ms  StructTypes
     19.4 ms  JSON3
      2.1 ms  MathProgBase
    460.4 ms  NLopt_jll 99.85% compilation time (100% recompilation)
    792.2 ms  MutableArithmetics
      6.7 ms  BenchmarkTools
      1.0 ms  DiffRules
      6.4 ms  DiffResults
    253.9 ms  MacroTools
      0.6 ms  CommonSubexpressions
    419.2 ms  ForwardDiff
      0.6 ms  Bzip2_jll
      1.8 ms  CodecBzip2
      0.2 ms  Zlib_jll
      3.1 ms  CodecZlib
   8328.0 ms  MathOptInterface
    333.9 ms  NLopt
      9.9 ms  ProgressMeter
   3115.1 ms  MixedModels
     13.7 ms  Combinatorics
    119.7 ms  NamedArrays 9.60% compilation time

mkitti · September 22, 2022, 7:26am

The the particular long ones to load were the following.

8328.0 ms MathOptInterface
3115.1 ms MixedModels
2709.2 ms CategoricalArrays 91.78% compilation time (96% recompilation)
2382.4 ms DataFrames
1736.3 ms StaticArrays

jmair · September 22, 2022, 6:03pm

I debug from the REPL directly (launched from the REPL but still debug through the source as normal in VS code). If you write @enter or @run (I believe, but dont use the second one), then the debugger will launch directly on the code you put. So

@run data=maker()

Will specify the entry point to debugging, with all of your modules already loaded. This is a much faster process than debugging in a separate process as it only loads your code once (Debugging · Julia in VS Code more details towards the bottom).

Strange about the compiled modules going away, are you opening a folder/workspace in VS code, or just opening individual files? In my experience, I set the compiled modules once per folder/workspace and it has persisted.

Ross_Boylan · September 22, 2022, 10:00pm

@jmair thanks! @enter finally gets me a quick enough startup time, 3s, at least after the initial compilation.

What’s up with @run? It just seems to hang when I use it, and it even hangs if I go into help mode in the REPL and enter @run.

I opened the folder in VSCode originally, although now I just exit and restart VSCode and pick up where I was. I added DataFrames again to the “Compile” list (via GUI), exited VSCode and restarted. It was gone. I tried adding it both before and after executing “Enable Compiled Mode”; neither worked.

jmair · September 22, 2022, 10:50pm

I’m not sure why @run gives you trouble, but I always tend to use @enter myself.

As for the compiled modules, I usually put something like “DataFrames.”, since the trailing dot is important (I think) as it indicates that the entire module and submodules should be compiled. There’s not much info in the docs page on having this configuration save, but it seems to save for me. Maybe someone else knows why you’re having this issue. This whole compiled modules system as you mentioned could probably be made a bit more used friendly, like having a lot of options in settings.json.

Topic		Replies	Views
Ways to make slow/sluggish REPL/interactive development experience faster? Performance repl , ttfp	35	5682	July 23, 2019
Can Julia really be used as a scripting language? (Performance) Performance	69	8477	July 28, 2020
Compiler work priorities Internals & Design	123	23502	August 6, 2021
How to effectively develop in Julia? New to Julia	17	1708	May 23, 2021
List of most desired features for Julia v1.x Community gripes , suggestions	182	24466	November 17, 2017