Investigating large latency on a constrained Windows environment

Rebooted and ran in VSC (as before):

using : 234.641995 seconds (4.61 M allocations: 283.895 MiB, 0.14% gc time, 0.09% compilation time: 76% of which was recompilation)
Proj  :   1.202910 seconds (22.04 k allocations: 1.244 MiB)
Makie : 102.903060 seconds (2.57 M allocations: 244.475 MiB, 0.29% gc time, 0.01% compilation time)
CSV   :   3.321988 seconds (2.85 M allocations: 193.021 MiB, 5.61% gc time, 314.71% compilation time: 10% of which was recompilation)

No other tasks running in parallel (browser, etc).

Will now reboot and run in terminal.

T-T I feel sad for youā€¦ (and kinda relate and empathise too)

Is this first-day run thing a Windows issue? I never saw that, and I donā€™t understand why would it happen actually.

Here is the terminal output:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.3 (2024-04-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> print("using : ")
using :
julia> @time begin
           using DataFrames
           using CSV
           using Dates
           using ZipArchives
           using GeoStats
           using GeoIO
       end
166.836199 seconds (4.61 M allocations: 283.762 MiB, 0.14% gc time, 0.21% compilation time: 79% of which was recompilation)

julia> print("Proj  : ")
Proj  :
julia> @time import Proj
  1.079934 seconds (21.97 k allocations: 1.239 MiB)

julia> print("Makie : ")
Makie :
julia> @time import CairoMakie as Mke
 81.321381 seconds (2.57 M allocations: 244.600 MiB, 0.25% gc time, 0.01% compilation time)

julia> print("CSV   : ")
CSV   :
julia> @time postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)
  3.375755 seconds (2.56 M allocations: 172.527 MiB, 0.71% gc time, 565.43% compilation time: 5% of which was recompilation)
14660Ɨ3 DataFrame

Some variability in timings, as you can see, but this still took over 4 minutes to get to the CSV.read.

Will redo with timings for each package.

1 Like

With full breakdown of packages:

print("DataFrames  : ")
@time using DataFrames
print("CSV         : ")
@time using CSV
print("Dates       : ")
@time using Dates
print("ZipArchives : ")
@time using ZipArchives
print("GeoStats    : ")
@time using GeoStats
print("GeoIO       : ")
@time using GeoIO

print("Proj        : ")
@time import Proj
print("Makie       : ")
@time import CairoMakie as Mke

print("CSV.read    : ")
@time postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)

Here are the results:

DataFrames  :  16.313810 seconds (602.77 k allocations: 45.622 MiB, 0.27% gc time, 0.08% compilation time)
CSV         :   4.234787 seconds (132.76 k allocations: 8.384 MiB)
Dates       :   0.001294 seconds (419 allocations: 31.453 KiB)
ZipArchives :   1.127855 seconds (28.95 k allocations: 1.904 MiB)
GeoStats    :  78.244798 seconds (1.39 M allocations: 78.456 MiB, 0.07% gc time, 0.09% compilation time: 52% of which was recompilation)
GeoIO       :  83.989450 seconds (2.46 M allocations: 149.515 MiB, 0.27% gc time, 0.17% compilation time: 97% of which was recompilation)
Proj        :   1.251061 seconds (22.04 k allocations: 1.244 MiB)
Makie       :  82.301510 seconds (2.57 M allocations: 244.475 MiB, 0.33% gc time, 0.01% compilation time)
CSV.read    :   3.110800 seconds (2.60 M allocations: 175.956 MiB, 0.67% gc time, 302.93% compilation time: 9% of which was recompilation)

For info, second (and subsequent) runs look like this:

DataFrames  :   1.459538 seconds (602.77 k allocations: 45.606 MiB, 3.22% gc time, 0.70% compilation time)
CSV         :   0.364448 seconds (132.77 k allocations: 8.384 MiB)
Dates       :   0.001306 seconds (419 allocations: 31.453 KiB)
ZipArchives :   0.080889 seconds (28.95 k allocations: 1.904 MiB)
GeoStats    :   3.951864 seconds (1.39 M allocations: 78.453 MiB, 1.21% gc time, 1.59% compilation time: 47% of which was recompilation)
GeoIO       :   4.910278 seconds (2.46 M allocations: 149.514 MiB, 4.15% gc time, 2.78% compilation time: 97% of which was recompilation)
Proj        :   0.090761 seconds (22.04 k allocations: 1.244 MiB)
Makie       :   5.359945 seconds (2.57 M allocations: 244.475 MiB, 4.69% gc time, 0.14% compilation time)
CSV.read    :   3.087068 seconds (3.10 M allocations: 210.798 MiB, 1.17% gc time, 308.95% compilation time: 9% of which was recompilation)

Possible reason:

  • Windows updates in the background
  • virus scanner

Best way to avoid:

  • use Linux or Mac

Maybe, but I canā€™t believe this is a widespread issue with julia on Windows or others would have reported it.

More likely something about my setup, Iā€™d have thought.

I work from home, but use corporate IT infrastructure, so both domestic set-up and corporate to consider, and no option to switch OS.

And just start up of Julia in terminal, what does it take?

In any case, generally loading packages doesnā€™t take THAT long, and your hardware is reasonably powerful. Iā€™d consider some clean re-install of Windows Julia (look for or ask here how to do it), and then installation of your packages one by one into a project, checking startup times.

For comparison: on my modest M1 Mac mini, starting my current package, which includes DataFrames and a number of other packages, took 0.6 s.

To be honest, that helps. I remember until around 4 months ago, I had all my packages Naively in the native Julia environment.

Decided to install Julia fresh. Got rid of the junk packages I had and finally resolved to use environments in a sophisticated manner. Sped up, but still not as much as I expected. The first run time issue is there with me too. But ig, thatā€™s all right. Itā€™s not as bad with me tbh. Takes no longer than 30 seconds for me.

Whatā€™s interesting is that judging from the number of allocations the runs are doing the same thing, but one taking much longer.

Is the hard drive encrypted?

I do feel this is strictly a consequence of background processes. Something running in the background - either a window of a browser/vs code itself or maybe some other software that is there all the time.
@TimG do check your background processes once. Also, I do have device encryption by default as @lmiq said. Looked it up - it does slow down the PC quite a lot.

You donā€™t even need to switch OS to use Linux, you could just install WSL and then install julia there, the linux-windows integration of apps and files is also much nicer with the latest WSL updates, you can even use graphical apps installed on WSL, and you can access any files inside WSL from Windows by just doing explorer.exe <path>.

But if his laptop is administered by his company he might not be able to do thatā€¦

1 Like

FWIW, this is what a I get here, with a 5-year old laptop:

julia> @time begin
           using DataFrames
           using CSV
           using Dates
           using ZipArchives
           using GeoStats
           using GeoIO
       end
  5.557891 seconds (4.45 M allocations: 272.867 MiB, 4.17% gc time, 3.66% compilation time: 96% of which was recompilation)

julia> @time import CairoMakie as Mke
  3.706602 seconds (2.53 M allocations: 247.502 MiB, 7.53% gc time, 0.46% compilation time)

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 Ɨ Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = vim


Use the ā€œTask managerā€ to see if some other process is running (heavily) at same time triggered by the Julia run.

(@v1.10) pkg> activate .
  Activating project at `E:\Desktop\Julia Package Speed Check`

julia> @time using CSV, DataFrames, Dates, ZipArchives, GeoStats, GeoIO
 13.545861 seconds (4.53 M allocations: 277.763 MiB, 3.66% gc time, 4.36% compilation time: 90% of which was recompilation)

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a550 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 Ɨ 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

I restarted my PC after installing the packages and running the program once.
I donā€™t think thereā€™s any problem with Windows 11 as suchā€¦ I mean my system config has no comparison with that of @TimG . Plus Iā€™m only on a single Thread. Soā€¦ idkā€¦ probably some antivirus software or some other background app. (Iā€™ll now delete these packages)

I do want to point out the macro @time_imports which will help identify which dependencies are contributing to load time.

For example, the following would be of interest.

@time_imports begin
           using DataFrames
           using CSV
           using Dates
           using ZipArchives
           using GeoStats
           using GeoIO
       end

For additional diagnostics, do the following.

julia> ENV["JULIA_DEBUG"] = "loading"
3 Likes

A majority of the Julia developers do not use Windows, so Windows support often lags slightly.

@mbauman I think this thread split around 53
https://discourse.julialang.org/t/the-problem-with-julia-that-makes-me-want-to-leave-first-run-times/114316/53

where a new problem involving Windows and the JuliaGeo ecosystem emerged, distinct from the original post. Could we split the thread there, so it might be easier for more specialized analysis of the latter issue?

4 Likes

Exactly. Although, tbh, I might not be motivated to do this even if I could. This problem costs me 5 minutes a day and I donā€™t know linux at all.