I have recetly started moving to Julia as my primary language. I like it very much except one issue, am not sure what would be best model to develop in Julia.
Primarily I use Python with lots of iterations and checks, as in use a toy dataset, write couple of lines, run it once to see if everything comes out ok. keep iterating and then run the final program on actual dataset which will now take couple of days.
Problem with julia I am facing is I have no clue how long a function will take to run. Sometimes a simple function with only couple of lines takes 10-15 sec before first run. followed by each iteration running really smooth, then suddenly again 15 sec (or jit I presume??). Somehow it completely breaks the flow for me. So,
How do the community majorly writes long Julia programs? Is there any preferred way of iterating code during writing that ensures faster development time?
Is there any rule of thumb on what modification to my function would cause significant delay in compiling etc?
Is there some sort of development vs production flag I can set to ensure faster compile time during writing and faster runtime during production?
Sorry if it comes off as too trivial, but I thought better to ask then gripe about. Such minor nuisances were keeping me from using the language more. version 1.6 sorted out so many of these issues that now I am spending more time in julia then python. But still cannot switch full time.
Start writing your own packages as soon as possible. Instead of doing everything in a script, wrap your functionality in a package that you import in you scripts. Revise.jl will recompile parts of the package code as you change it, when needed.
But also, 15 seconds is a lot of compiling for julia 1.6. I run pretty complicated models and I only get compile times like that when I’m using a GPU. Maybe reducing the complexity of your types, or making sure they are type stable will help.
Welcome to Julia, and it is great you are considering using more Julia.
I use the editor in conjunction with Revise.jl to develop the program. Also, in order to test it and document it better I create my own packages, it is very simple to do it (I recommend PkgTemplates.jl).
Well, you can use “-O 0” to reduce the compilation time, but the runtime would be worse.
I usually use DaemonMode.jl (disclaimed: I am the author) to run faster the program when the package is finished (or while I am developing/running scripts).
I have checked in REPL and Pluto notebooks (after each cell the notebook show the execution time). And for me almost every thing timed same. That is why I assumed it to be normal.
(I was using it for trying the codes of the book Statistical Rethinking)
Eg.:
using Plots, StatsKit
# takes about 4 sec
using StatisticalRethinking
# 126 seconds!
begin
pgrid = 0.0:0.01:1.0
prior = ones(length(pgrid))
likelihood1 = pdf.(Binomial.(3,pgrid),3).*prior
likelihood2 = pdf.(Binomial.(4,pgrid),3).*prior
likelihood3 = pdf.(Binomial.(7,pgrid),5).*prior
likelihood4 = pdf.(Binomial.(7,pgrid),5).*prior
likelihood1 = likelihood1./sum(likelihood1)
likelihood2 = likelihood2./sum(likelihood2)
likelihood3 = likelihood3./sum(likelihood3)
likelihood4 = likelihood4./sum(likelihood4)
end
# 15 seconds
begin
plot(pgrid, likelihood1)
plot!(pgrid,likelihood2)
plot!(pgrid,likelihood3)
plot!(pgrid,likelihood4)
end
# 14 seconds
I also asked similar question here first in hopes to get some clue on how to speedup things:
All of the above was running in Pkg environment StatisticalRethinkingTuring.jl (Link Here ). That I initialized as per instructions
cd StatisticalRethinkingTuring.jl
julia
julia >]
(@v1.6) pkg> activate .
(@v1.6) pkg> initialize
julia > using Pluto
julia > Pluto.run()
Julia version 1.6
MacOS Mojave, Intel Corei5 2.5 GHz (2012), 8GB ram
@Raf and @mike So Revise is like Fortran/Make ecosystem?
That is, use make to compile selective parts and try writing each function as separate file.
@dmolina Thanks DeamonMode indeed looks promising thank you for the suggestion. I found ur talk on youtube as well, let me go through it and give it a go.
Welcome to the community. Besides the direct answer you receive in this thread, I also recommend you to use the search tool of this forum and look for “workflow”; there are other people who posted similar questions in the past, and maybe the answers given to them might also be helpful.
There is also a brief section in the manual with a few Workflow Tips, that be helpful.
I think there is a question that should be clarified before giving more detailed advice. When you write some code and run it to try it, are you doing it starting Julia for each new trial? That might be a major reason for very slow runs, because Julia is definitely much slower at startup, and gets faster once packages are loaded and functions are compiled. So it’s normally advisable to start a single Julia session, and then iterate without exiting as long as possible.
The tips of wrapping your code in modules and/or using Revise are specially meant to that use case.
I use the @time macro a lot. In front of all my major functions. In 1.6 it will tell you how much of the time was spent compiling and running the garbage collector. My functions typically take a second or two to compile and, after running a bunch of times, take a second or two on GC. Maybe that’s what you’re seeing.
Yes maybe a bit like that, except automated, and the functions can be anywhere in the package the files don’t matter. If you make any changes to code, other functions that the change touches will be recompiled before the next command is run. You don’t have to do anything for that to happen.
But it wont run on your scripts unless you specifically tell Revise to track the file (I think, I don’t really use that).
Edit: I was almost going to ask if you were using Turing. I’ve also found that those models can take a while to compile.
I’ve seen this advice before, and used some of it, but it’s somehow hard to get the big picture of what people are after. (Live-coding video would likely help more than writing text.)
Why do I need to create a package for part of my code that would otherwise be in a script, or another file? Is it because of Revise.jl?
How’d you in practise do it in vscode? Open two directories, one for the package, and another for the script, making sure REPL starts in the script directory? (I didn’t find a way to set where REPL starts from in vscode.)
I’ve come to the conclusion that for me it’s easiest to protype new code in vscode, selecting lines to execute. When the code has taken shape, writing proper script or a Pluto notebook maybe.
Well, you do not need to create a package for the code, but when the code increases, for testing, documentation, and dependences it is good idea to consider put it (or parts of it) as packages to reuse it.
No, Revise.jl can work with any file, not only with packages. You use Revise.includet(“file”) and then, when the file is changed (by the editor) the changes are automatically updated. That it is considering that you are running the code in the REPL (or inside the IDE).
If not, I suggest to use DaemonMode.jl, because it load the packages only once, and it is able to run the script faster when it is run several times (and it run always the current version of the script).
I do not use vscode, but you can also put package script in the same directory without any problem.
using StatisticalRethinking takes “only” 12 seconds for me. That’s reasonable considering that it loads ​heavy packages like StatsPlots and Turing, which are famously slow to load.
Your reported time for using StatisticalRethinking is similar to my precompilation time. These 120 seconds should only occur when you install the package. Maybe something in your installation prevented the packages to precompile when you installed them, so it was done later when you called using. Do you have to wait this much everytime you restart Julia?
It’s weird that your likelihood block takes 15 seconds. Mine takes only 0.3 second! Again, maybe something wrong with precompilation/invalidation which causes Distributions to recompile.
The plotting block takes time the first time it is executed after starting Julia. That one is expected (the famous “time to first plot problem”). It’s annoying and as you said already improved in Julia 1.6. Not sure there’s much to do here except waiting on the first plots after your start the notebook.
Yes sometimes you have to wait again later, for example if you make a new kind of plot that also takes long to compile. But waiting 15s for an iteration when the previous iterations were fast… That’s not normal, that sounds like this iteration is doing something different from the previous ones, requiring different code to be compiled.
Is there any package to do some sort of ptrace on julia jit? to figure out where it is taking time. The whole StatisticalRethinkingTuring.jl is a DrWatson project and i have a suspicion that might be making it go slow.
Do you have to wait this much everytime you restart Julia?
yes. time varies between 30 sec to 120 sec. I also asked similar question here.
Again, maybe something wrong with precompilation/invalidation which causes Distributions to recompile.
Anyway to check it?
@StatisticalMouse I also did same in Python, where vscode line by line execution is basically a jupyter notebook. so it works fine. But with julia problem was same, these 5-15 sec breaks were quite distracting for me. Another problem with pure REPL based approach was whole global variable vs loop variables etc. I am not totally comfortable with it yet. but getting the hang of it.
@Raf I wasn’t aware Turing was famous for it. I was basically drawn to it for the seamless ecosystem of ML and Turing in julia. Tensorflow Probability was quite unintutive, where I spent more time googling the syntax then actually thinking about my problem.
If the problem is that it takes a long time each time you start Julia, write a script that loads all the packages and runs all the functions you are using. Then crate a sys image with PackageCompiler. That’s what I did and it helped a ton.