Investigating large latency on a constrained Windows environment

On first start-up in the morning in VSCode, my code takes fully 7 minutes before executing the first line of my actual script (to read a csv file).

using DataFrames
using CSV
using Dates
using ZipArchives
using GeoStats
using GeoIO

import Proj
import CairoMakie as Mke

postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)

I do not mean the time it takes to open and initialise VSCode itself. I mean how long I wait after I hit run before anything I asked for starts to happen.

This code is running in its own environment, not global.

I think this is related to Makie, as @lmiq suggests, but I’m not sure.

All subsequent runs are speedy - this is just the first run.

No doubt there’s an easy fix. I’m just not sure what!

1 Like

True to be honest, I tested Plots and it takes 2 seconds to plot a function on top of initiallizing. Makie on the other hand still takes a dozen seconds or so…

Can you share the output of

versioninfo()

?

I put your code, excluding the last line (I do not have the file TerminatedPostcodes.csv) in a file with the name init.jl.

Then I get:

julia> @time include("init.jl")
  4.588735 seconds (6.71 M allocations: 494.214 MiB, 13.74% gc time, 3.01% compilation time: 86% of which was recompilation)

Can you share the .csv file to see if it makes a difference?

UPDATE:
Just for a test I used this file: https://www.doogal.co.uk/files/postcodes.zip

Running your code, including reading of the .csv file now takes about 30s.

But if I start Julia with julia --project -t auto the time goes down to 11.3 seconds.

If you always need to load this file, converting it to .arrow format once and then load the .arrow file each time can speed things up even more as explained here: Failing to import (relatively) large CSV file with Julia and VSC - #17 by ufechner7

1 Like

I tried GLMakie to test how fast it loads up with

julia> @time using GLMakie; plot([(i, sin(i)) for i in 0:0.001:pi/2])

With a dozen tabs opened in the background and a dozen vs code files open, it took around 15-16 seconds.

Without anything, however, it was just shy of 6.5 seconds! Huge difference!

(In my environment, I have GLMakie, Plots, GeometryBasics and DataStructures installed).
And yeah, ig creating a system image can get these times even lower as others have suggested. Will try it soon!

I also regularly get very slow startup in VS Code, I’m not sure what’s going on but I suspect it’s the plugin that is updating, recompiling or something like that. Sometimes you can run stuff in the REPL but using shortcuts to executes a cell will run only minutes later.

I can’t share the CSV file because it contains private details. However, it is a simple file, 3 columns of 14,000 rows.

I tried again this morning with:

print("using : ")
@time begin
    using DataFrames
    using CSV
    using Dates
    using ZipArchives
    using GeoStats
    using GeoIO
end
print("Proj  : ")
@time import Proj
print("Makie : ")
@time import CairoMakie as Mke
print("CSV   : ")
@time postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)

Which produced the following timing:

using : 189.748529 seconds (4.61 M allocations: 283.892 MiB, 0.17% gc time, 0.11% compilation time: 75% of which was recompilation)
Proj  :   1.068109 seconds (22.04 k allocations: 1.244 MiB)
Makie :  78.394334 seconds (2.57 M allocations: 244.475 MiB, 0.35% gc time, 0.01% compilation time)
CSV   :   3.507138 seconds (2.60 M allocations: 175.795 MiB, 0.77% gc time, 311.58% compilation time: 9% of which was recompilation)

OK, so not 7 minutes today, but still nearly 5 minutes before it even starts to read my file.

As I say, this is an issue only with the first run of the day. Even if I close VSCode and reopen again, a second run gives the following timing:

using :  10.471567 seconds (4.61 M allocations: 283.954 MiB, 2.99% gc time, 2.06% compilation time: 81% of which was recompilation)
Proj  :   0.059018 seconds (22.04 k allocations: 1.244 MiB)
Makie :   5.758698 seconds (2.57 M allocations: 244.475 MiB, 4.43% gc time, 0.13% compilation time)
CSV   :   3.401152 seconds (2.60 M allocations: 175.793 MiB, 1.14% gc time, 308.47% compilation time: 9% of which was recompilation)

and within the same VSCode session, it’s slightly faster still:

using :   5.474012 seconds (4.61 M allocations: 283.907 MiB, 5.90% gc time, 3.72% compilation time: 79% of which was recompilation)
Proj  :   0.033830 seconds (22.04 k allocations: 1.244 MiB)
Makie :   3.343034 seconds (2.57 M allocations: 244.475 MiB, 8.44% gc time, 0.23% compilation time)
CSV   :   3.196636 seconds (2.61 M allocations: 175.897 MiB, 5.95% gc time, 307.54% compilation time: 9% of which was recompilation)

I don’t have any problem with the overhead in these last two examples. It is just the first run of the day that is uniquely painful.

I might try again tomorrow to break down the using timings a bit more…

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a550 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 4 default, 0 interactive, 2 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

julia> 

I never use the official VSCode Julia console. In VSCode I always just launch a terminal, and from the terminal I launch Julia with

julia --project

or

julia --project -t auto

Did you try that?

Extra question: How much RAM do you have?

I launch Julia simply with typing julia in a new terminal and then say

] activate.

Same thing right?

Yes, that has the same effect.

2 Likes

We have the same configuration! Almost, except I only use a single thread lol.
And while I don’t load these many packages, it’s not THAT slow for me. But yeah around a minute to say the least to get everything started.

Perhaps you’re running things in the background (say your browser or maybe some other background app)? I genuinely found that to scale the times by a large amount!

Any way to set this up as the default project setting? In Python, for instance, that is the case if you set up an environment.

@TimG , what are the startup times if you start Julia directly without VSCode?

And another question, what is in your global Julia environment. Activate it in the package manager, and run status command like this:

ThisNThat) pkg> activate
  Activating project at `~/.julia/environments/v1.10`

(@v1.10) pkg> st
  [4c88cf16] Aqua v0.8.7
  [6e4b80f9] BenchmarkTools v1.5.0
...

BTW, just found a few packages which do not belong there and deleted them :slight_smile:

Here is run time in julia directly (from a windows command prompt) using julia --project -t auto:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.3 (2024-04-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> print("using : ")
using :
julia> @time begin
           using DataFrames
           using CSV
           using Dates
           using ZipArchives
           using GeoStats
           using GeoIO
       end
 11.096777 seconds (4.61 M allocations: 283.760 MiB, 1.93% gc time, 2.30% compilation time: 80% of which was recompilation)

julia> print("Proj  : ")
Proj  :
julia> @time import Proj
  0.090549 seconds (21.97 k allocations: 1.239 MiB)

julia> print("Makie : ")
Makie :
julia> @time import CairoMakie as Mke
  6.088315 seconds (2.57 M allocations: 244.538 MiB, 3.36% gc time, 0.11% compilation time)

julia> print("CSV   : ")
CSV   :
julia> @time postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)
  3.392847 seconds (3.04 M allocations: 206.973 MiB, 0.80% gc time, 564.47% compilation time: 5% of which was recompilation)
14660×3 DataFrame

( I just pasted the code!)
Not a first run of the day, obviously.

Here is my global env:

(@v1.10) pkg> st
Status `C:\Users\TGebbels\.julia\environments\v1.10\Project.toml`
  [69666777] Arrow v2.7.2
  [336ed68f] CSV v0.10.14
⌃ [13f3f980] CairoMakie v0.11.10
⌃ [35d6a980] ColorSchemes v3.24.0
⌃ [5ae59095] Colors v0.12.10
  [a93c6f00] DataFrames v1.6.1
⌃ [f5a160d5] GeoIO v1.12.13
  [dcc97b0b] GeoStats v0.56.0
  [08abe8d2] PrettyTables v2.3.1
Info Packages marked with ⌃ have new versions available and may be upgradable.

(@v1.10) pkg> 

Like most beginners, I used to run everything in my global environment, so maybe still some tidying up to do.

huh! Not as bad as VSCode after all. Perhaps the precompilations, downloads and extension load-ups it is!

I don’t think that’s true. It’s just not the first run of the day.

Here is running from terminal in VSCode:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.3 (2024-04-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> print("using : ")
using : 
julia> @time begin
           using DataFrames
               using CSV
           using Dates
           using ZipArchives
           using GeoStats
           using GeoIO
       end
  5.309556 seconds (4.61 M allocations: 283.746 MiB, 3.75% gc time, 4.40% compilation time: 80% of which was recompilation)

julia> print("Proj  : ")
Proj  : 
julia> @time import Proj
  0.037400 seconds (21.97 k allocations: 1.239 MiB)

julia> print("Makie : ")
Makie : 
julia> @time import CairoMakie as Mke
  3.164185 seconds (2.57 M allocations: 244.538 MiB, 6.02% gc time, 0.25% compilation time)

julia> print("CSV   : ")
CSV   : 
julia> @time postcodes = CSV.read("TerminatedPostcodes.csv", DataFrame)
  3.210526 seconds (2.56 M allocations: 172.459 MiB, 0.69% gc time, 567.95% compilation time: 5% of which was recompilation)
14660×3 DataFrame

In other words, the same (for second or subsequent run of the day).

Can only test one of these two options each day. I’ll try one tomorrow morning.

PS I have 16GB ram

yeah well, I just opened up Plots for the first time in the day and it alone took like 10-15 seconds in the REPL… something just seems to slow things down regardless of the version lol. Idk well. Ig we just deal with it

Why? Does a first run after reboot behave differently to first run of a day?

Don’t know, actually. Will have a go now.