Julia is slow questions - how to request information


#1

As the Julia language grows we will see more ‘Julia is slow when I do XYZ’ questions. For instance I have seen several discussions recently about ‘slowness’ when reading in data. We cannot have hard and fast rules for dealing with all enquiries - and of course we must always be friendly.
However on other forums it is common practice to at least ask for version info - so that responders can say ‘Aha! That is a known bug with that version’.
Perhaps we need a FAQ with the minimal set of information which could be requested. I start the ball rolling:

To help the community answer your question, please provide the following information:

The output of running versioninfo() in Julia
How much RAM memory does your system have
Do you know what type of storage you are reading from / writing to

I know the last two questions are vague, but we deal with several OSes and the specific commands are different per OS.

Also I think having a FAQ to point towards stops the oriignal poster from feeling victimised - we say that everyone is asked for that information.


#2

As I see - the memory info is included, but the disk speed is missing.

julia> versioninfo(verbose=true)
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Debian GNU/Linux 9.5 (stretch)
  uname: Linux 4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 unknown
  CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz: 
              speed         user         nice          sys         idle          irq
       #1  2199 MHz   47644956 s          5 s    4559271 s  128333570 s          0 s
       #2  2199 MHz   49273459 s          4 s    4838054 s  127906377 s          0 s
       #3  2199 MHz   48640666 s          9 s    4727385 s  128568939 s          0 s
       #4  2199 MHz   47967578 s          5 s    4632399 s  129552534 s          0 s
       #5  2199 MHz   48343371 s         71 s    4491562 s  129517216 s          0 s
       #6  2199 MHz   48008016 s          8 s    4507820 s  129947851 s          0 s
       #7  2199 MHz   47632402 s         48 s    4414191 s  130510462 s          0 s
       #8  2199 MHz   46595587 s         11 s    4417427 s  128362144 s          0 s
       
  Memory: 29.444866180419922 GB (26150.76171875 MB free)
  Uptime: 1.888686e6 sec
  Load Avg:  0.33935546875  0.1455078125  0.05078125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
  JULIA_PATH = /usr/local/julia
  JULIA_MAJOR = 1.0
  JULIA_VERSION = 1.0.1
  JULIA_SHA256 = 9ffbcf7f4a111e13415954caccdd1ce90b5c835cee9f62d6ac708f5b752c87dd
  JULIA_DIR = /usr/local/julia
  JULIA_PATH = /usr/local/julia
  GOPATH = /go
  HOME = /root
  TERM = xterm
  PATH = /go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin


#3

I hang my head in shame. The verbose=true option seems perfect.
I guess we should start by asking for versioninfo(verbose=true)

Plus a pkg>status


#4

versioninfo(verbose=true) does include personal info, though (at least through HOME; either @ImreSamu is unwise enough to always run as root, or wise enough to have noticed this pitfall)

I don’t like asking people to dox themselves when asking for performance tips. Revealing non-pseudonymous identity should be by choice, not by accident.


#5

#1. For me - some command line parameters are missing,
and some are performance related.

julia -O0

 -C, --cpu-target <target> Limit usage of CPU features up to <target>; set to "help" to see the available options
 -O, --optimize={0,1,2,3}  Set the optimization level (default level is 2 if unspecified or 3 if used without a level)

imho: the command line parameters can be also important.

#2.
The julia/.github/ISSUE_TEMPLATE.md has an other important advice:

  • creating a minimal reproducible example

#6

:slight_smile:

I agree, Privacy is important.

I am a big fan of containerized development with docker.
and the official julia docker images - is configured for ṙoot user.

With docker - I can test and use multiple julia versions, and easier to create a minimal reproducible example

Julia 1.0.2

# docker run -it julia:1.0.2  bash -c 'id && julia'
uid=0(root) gid=0(root) groups=0(root)
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.2 (2018-11-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> 

Julia 0.7

# docker run -it julia:0.7.0-stretch  bash -c 'id && julia'
uid=0(root) gid=0(root) groups=0(root)
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0 (2018-08-08 06:46 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> 


#7

The third way :slight_smile:


#8

Going wildly off topic, my preferred containerisation framework is Singularity.
They have given a great deal of thought to security.
https://singularity.lbl.gov/docs-security
https://www.sylabs.io/guides/3.0/user-guide/security_options.html


#9

My two cents: I think we should focus on having people create a proper minimum reproducible example. That’s by far the most important IMO. I think this post covers it quite well already.

I actually think a long post, including all sorts of details that may or may not be relevant to the problem, can deter people from responding. And personally, I find forums that require me to submit all kinds of system details and logs before even looking at my question (when I know it’s irrelevant) quite frustrating.

As for “Julia is slow” questions, I rarely find that they’re stated in absolute terms, rather they tend to be compared to another implementation on the same system, in which case details such as CPU speed and RAM are often not relevant.

Here’s an example of a “Julia is slow” question which I think is fully sufficient.


#10

I agree completely that having a reproducible example is the most important, and that these are usually comparative anyway, making things like CPU and storage often irrelevant. I don’t see why people would feel “victimized” by being asked for them when they might be relevant. Requiring the a priori submission of all kinds of data that the user knows are irrelevant is more frustrating imo. For a FAQ, I would ask for:

  • a reproducible example
  • version info (small and probably relevant)
  • read the performance tips (but do not get enoyed if the poster missed some :slight_smile:

#11

@peter.derijk I think you make a lot of sense here.
Indeed we should ask for a reproducible sample and the version information.