Julia 0.7.0-alpha strange package loading

On Julia 0.7.0-alpha I am testing DataFrames.jl package on Windows 10. It should be already precompiled (it is not the first time I load it). The strange behavior I get is the following. When I get to this state of REPL:

$ /d/Julia7/bin/julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-alpha.0 (2018-05-31 00:07 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

julia> using DataFrames
 19.202774 seconds (20.17 M allocations: 1.171 GiB, 2.11% gc time)

julia>

it hangs for ~1 minute (and Julia is using 100% of one core for something) - i.e. REPL shows the prompt but is not responsive. Also, package loading time is very long as you can see.

I was wondering if anyone has also encountered such problems and what is the best way to investigate what is causing the lag?

You could try clear out .julia/compiled to force a recompile from scratch.

Unfortunately this did not help. Do you have any hints:

  1. why recompiling from scratch could have helped (apart from a basic reasoning that if all else fails this is something that should be tried).
  2. what could possibly cause such lags (so that I can track down the reasons and update DataFrames.jl with a proper PR to fix it).

Thanks!

I was thinking about https://github.com/JuliaLang/julia/issues/25900 (and https://github.com/JuliaPlots/Plots.jl/issues/1379).

I’ve encountered some incredibly slow package loading in 0.7. I seem to remember this only being on the initial pre-compile. I’ll try to post back here with specifics when I see it again.

I have dug into dependencies of DataFrames to check what causes biggest lags. On my machine those are WeakRefStrings and CategoricalArrays.

CC @quinnj @nalimilan as you probably know those packages best (if you have time to have look at the issue of course :smile:).

Thank you. Can you please check DataFrames on your machine?

A staggering 129s.

There’s a lot wrong though. Something is going on in StatsBase, it gets tons of method overwrite warnings. I know DataStreams badly needs to be tagged. Probably lots of things really need to be tagged.

Thanks! Unfortunately yes. I get those warnings on the first compilation also, but on the following I do not get any output with warnings.
But maybe those deprecations are causing all the lags - if this were the case then it is not so bad - we simply have to update all packages and all would be back normal.

It’s reeealllyyy slow so I’m not quite that optimistic, but certainly it is time to update all the DataFrames dependencies regardless.

If you get method overwrite warnings when precompiling it is likely that loading that package will invalidate a lot of code (needs to be recompiled) and be very slow to load.

The method overwrite warnings should definitely be investigated.

1 Like

I think it’s worth waiting until we’ve fixed all deprecations, they frequently cause large slowdowns.