Julia 0.7.0-alpha strange package loading

question

#1

On Julia 0.7.0-alpha I am testing DataFrames.jl package on Windows 10. It should be already precompiled (it is not the first time I load it). The strange behavior I get is the following. When I get to this state of REPL:

$ /d/Julia7/bin/julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-alpha.0 (2018-05-31 00:07 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

julia> using DataFrames
 19.202774 seconds (20.17 M allocations: 1.171 GiB, 2.11% gc time)

julia>

it hangs for ~1 minute (and Julia is using 100% of one core for something) - i.e. REPL shows the prompt but is not responsive. Also, package loading time is very long as you can see.

I was wondering if anyone has also encountered such problems and what is the best way to investigate what is causing the lag?


#2

You could try clear out .julia/compiled to force a recompile from scratch.


#3

Unfortunately this did not help. Do you have any hints:

  1. why recompiling from scratch could have helped (apart from a basic reasoning that if all else fails this is something that should be tried).
  2. what could possibly cause such lags (so that I can track down the reasons and update DataFrames.jl with a proper PR to fix it).

Thanks!


#4

I was thinking about https://github.com/JuliaLang/julia/issues/25900 (and https://github.com/JuliaPlots/Plots.jl/issues/1379).


#5

I’ve encountered some incredibly slow package loading in 0.7. I seem to remember this only being on the initial pre-compile. I’ll try to post back here with specifics when I see it again.


#6

I have dug into dependencies of DataFrames to check what causes biggest lags. On my machine those are WeakRefStrings and CategoricalArrays.

CC @quinnj @nalimilan as you probably know those packages best (if you have time to have look at the issue of course :smile:).


#7

Thank you. Can you please check DataFrames on your machine?


#8

A staggering 129s.

There’s a lot wrong though. Something is going on in StatsBase, it gets tons of method overwrite warnings. I know DataStreams badly needs to be tagged. Probably lots of things really need to be tagged.


#9

Thanks! Unfortunately yes. I get those warnings on the first compilation also, but on the following I do not get any output with warnings.
But maybe those deprecations are causing all the lags - if this were the case then it is not so bad - we simply have to update all packages and all would be back normal.


#10

It’s reeealllyyy slow so I’m not quite that optimistic, but certainly it is time to update all the DataFrames dependencies regardless.


#11

If you get method overwrite warnings when precompiling it is likely that loading that package will invalidate a lot of code (needs to be recompiled) and be very slow to load.

The method overwrite warnings should definitely be investigated.


#12

I think it’s worth waiting until we’ve fixed all deprecations, they frequently cause large slowdowns.