How to effectively develop in Julia?

amit1 · May 22, 2021, 9:03am

I have recetly started moving to Julia as my primary language. I like it very much except one issue, am not sure what would be best model to develop in Julia.

Primarily I use Python with lots of iterations and checks, as in use a toy dataset, write couple of lines, run it once to see if everything comes out ok. keep iterating and then run the final program on actual dataset which will now take couple of days.

Problem with julia I am facing is I have no clue how long a function will take to run. Sometimes a simple function with only couple of lines takes 10-15 sec before first run. followed by each iteration running really smooth, then suddenly again 15 sec (or jit I presume??). Somehow it completely breaks the flow for me. So,

How do the community majorly writes long Julia programs? Is there any preferred way of iterating code during writing that ensures faster development time?
Is there any rule of thumb on what modification to my function would cause significant delay in compiling etc?
Is there some sort of development vs production flag I can set to ensure faster compile time during writing and faster runtime during production?

Sorry if it comes off as too trivial, but I thought better to ask then gripe about. Such minor nuisances were keeping me from using the language more. version 1.6 sorted out so many of these issues that now I am spending more time in julia then python. But still cannot switch full time.

mike · May 22, 2021, 9:20am

In case you’ve not been introduced to Revise.jl, it’s probably the single biggest productivity gain you can get for development in Julia.

Raf · May 22, 2021, 9:34am

Start writing your own packages as soon as possible. Instead of doing everything in a script, wrap your functionality in a package that you import in you scripts. Revise.jl will recompile parts of the package code as you change it, when needed.

But also, 15 seconds is a lot of compiling for julia 1.6. I run pretty complicated models and I only get compile times like that when I’m using a GPU. Maybe reducing the complexity of your types, or making sure they are type stable will help.

dmolina · May 22, 2021, 9:40am

Welcome to Julia, and it is great you are considering using more Julia.

I use the editor in conjunction with Revise.jl to develop the program. Also, in order to test it and document it better I create my own packages, it is very simple to do it (I recommend PkgTemplates.jl).

Well, you can use “-O 0” to reduce the compilation time, but the runtime would be worse.

I usually use DaemonMode.jl (disclaimed: I am the author) to run faster the program when the package is finished (or while I am developing/running scripts).

carstenbauer · May 22, 2021, 9:56am

Let me second this. 15 seconds for compilation sounds like a lot. Can you perhaps share a MWE?

amit1 · May 22, 2021, 10:15am

I have checked in REPL and Pluto notebooks (after each cell the notebook show the execution time). And for me almost every thing timed same. That is why I assumed it to be normal.
(I was using it for trying the codes of the book Statistical Rethinking)
Eg.:

using Plots, StatsKit
# takes about 4 sec

using StatisticalRethinking
# 126 seconds!

begin
	pgrid = 0.0:0.01:1.0
	prior = ones(length(pgrid))
	likelihood1 = pdf.(Binomial.(3,pgrid),3).*prior
	likelihood2 = pdf.(Binomial.(4,pgrid),3).*prior
	likelihood3 = pdf.(Binomial.(7,pgrid),5).*prior
	likelihood4 = pdf.(Binomial.(7,pgrid),5).*prior
	likelihood1 = likelihood1./sum(likelihood1)
	likelihood2 = likelihood2./sum(likelihood2)
	likelihood3 = likelihood3./sum(likelihood3)
	likelihood4 = likelihood4./sum(likelihood4)
end
# 15 seconds

begin
	plot(pgrid, likelihood1)
	plot!(pgrid,likelihood2)
	plot!(pgrid,likelihood3)
	plot!(pgrid,likelihood4)
end
# 14 seconds

I also asked similar question here first in hopes to get some clue on how to speedup things:

All of the above was running in Pkg environment StatisticalRethinkingTuring.jl (Link
Here ). That I initialized as per instructions

cd StatisticalRethinkingTuring.jl
julia
julia >]
(@v1.6) pkg> activate .
(@v1.6) pkg> initialize
julia > using Pluto
julia > Pluto.run()

Julia version 1.6
MacOS Mojave, Intel Corei5 2.5 GHz (2012), 8GB ram

amit1 · May 22, 2021, 10:19am

@Raf and @mike So Revise is like Fortran/Make ecosystem?
That is, use make to compile selective parts and try writing each function as separate file.

@dmolina Thanks DeamonMode indeed looks promising thank you for the suggestion. I found ur talk on youtube as well, let me go through it and give it a go.

heliosdrm · May 22, 2021, 10:30am

Welcome to the community. Besides the direct answer you receive in this thread, I also recommend you to use the search tool of this forum and look for “workflow”; there are other people who posted similar questions in the past, and maybe the answers given to them might also be helpful.

There is also a brief section in the manual with a few Workflow Tips, that be helpful.

I think there is a question that should be clarified before giving more detailed advice. When you write some code and run it to try it, are you doing it starting Julia for each new trial? That might be a major reason for very slow runs, because Julia is definitely much slower at startup, and gets faster once packages are loaded and functions are compiled. So it’s normally advisable to start a single Julia session, and then iterate without exiting as long as possible.

The tips of wrapping your code in modules and/or using Revise are specially meant to that use case.

xiaodai · May 22, 2021, 10:47am

This might help https://www.youtube.com/watch?v=vVywlAgyedI

Ribeiro · May 22, 2021, 10:54am

I use the @time macro a lot. In front of all my major functions. In 1.6 it will tell you how much of the time was spent compiling and running the garbage collector. My functions typically take a second or two to compile and, after running a bunch of times, take a second or two on GC. Maybe that’s what you’re seeing.

Raf · May 22, 2021, 11:07am

Yes maybe a bit like that, except automated, and the functions can be anywhere in the package the files don’t matter. If you make any changes to code, other functions that the change touches will be recompiled before the next command is run. You don’t have to do anything for that to happen.

But it wont run on your scripts unless you specifically tell Revise to track the file (I think, I don’t really use that).

Edit: I was almost going to ask if you were using Turing. I’ve also found that those models can take a while to compile.

LaurentPlagne · May 22, 2021, 11:11am

You may find this short video useful (possible workflow with Revise.jl) Julia #0/2 : setup for beginner - YouTube

StatisticalMouse · May 22, 2021, 12:11pm

I’ve seen this advice before, and used some of it, but it’s somehow hard to get the big picture of what people are after. (Live-coding video would likely help more than writing text.)

Why do I need to create a package for part of my code that would otherwise be in a script, or another file? Is it because of Revise.jl?

How’d you in practise do it in vscode? Open two directories, one for the package, and another for the script, making sure REPL starts in the script directory? (I didn’t find a way to set where REPL starts from in vscode.)

I’ve come to the conclusion that for me it’s easiest to protype new code in vscode, selecting lines to execute. When the code has taken shape, writing proper script or a Pluto notebook maybe.

dmolina · May 22, 2021, 12:22pm

Well, you do not need to create a package for the code, but when the code increases, for testing, documentation, and dependences it is good idea to consider put it (or parts of it) as packages to reuse it.

No, Revise.jl can work with any file, not only with packages. You use Revise.includet(“file”) and then, when the file is changed (by the editor) the changes are automatically updated. That it is considering that you are running the code in the REPL (or inside the IDE).

If not, I suggest to use DaemonMode.jl, because it load the packages only once, and it is able to run the script faster when it is run several times (and it run always the current version of the script).

I do not use vscode, but you can also put package script in the same directory without any problem.

sijo · May 22, 2021, 12:36pm

I tried the same on my PC (Linux, dual core i5-7200U CPU @ 2.50GHz, 16GB):

(@v1.6) pkg> activate .
  Activating environment at `~/stat/StatisticalRethinkingTuring.jl/Project.toml`

(StatisticalRethinkingTuring) pkg> instantiate
   Installed AbstractMCMC ────────── v2.3.0
   Installed MCMCChains ──────────── v4.7.2
   Installed SciMLBase ───────────── v1.8.4
   Installed DrWatson ────────────── v2.0.1
   Installed StatisticalRethinking ─ v3.3.2
Precompiling project...
  63 dependencies successfully precompiled in 128 seconds (185 already precompiled)

julia> @time using StatisticalRethinking
 12.434140 seconds (20.41 M allocations: 1.334 GiB, 4.91% gc time, 0.04% compilation time)

julia> @time begin
                      pgrid = 0.0:0.01:1.0
                      prior = ones(length(pgrid))
                      likelihood1 = pdf.(Binomial.(3,pgrid),3).*prior
                      likelihood2 = pdf.(Binomial.(4,pgrid),3).*prior
                      likelihood3 = pdf.(Binomial.(7,pgrid),5).*prior
                      likelihood4 = pdf.(Binomial.(7,pgrid),5).*prior
                      likelihood1 = likelihood1./sum(likelihood1)
                      likelihood2 = likelihood2./sum(likelihood2)
                      likelihood3 = likelihood3./sum(likelihood3)
                      likelihood4 = likelihood4./sum(likelihood4)
              end;
  0.255271 seconds (671.12 k allocations: 39.900 MiB, 99.14% compilation time)

julia> @time begin
                      plot(pgrid, likelihood1)
                      plot!(pgrid,likelihood2)
                      plot!(pgrid,likelihood3)
                      plot!(pgrid,likelihood4)
              end;
 17.728324 seconds (45.90 M allocations: 2.738 GiB, 5.28% gc time)

Remarks

using StatisticalRethinking takes “only” 12 seconds for me. That’s reasonable considering that it loads heavy packages like StatsPlots and Turing, which are famously slow to load.
Your reported time for using StatisticalRethinking is similar to my precompilation time. These 120 seconds should only occur when you install the package. Maybe something in your installation prevented the packages to precompile when you installed them, so it was done later when you called using. Do you have to wait this much everytime you restart Julia?
It’s weird that your likelihood block takes 15 seconds. Mine takes only 0.3 second! Again, maybe something wrong with precompilation/invalidation which causes Distributions to recompile.
The plotting block takes time the first time it is executed after starting Julia. That one is expected (the famous “time to first plot problem”). It’s annoying and as you said already improved in Julia 1.6. Not sure there’s much to do here except waiting on the first plots after your start the notebook.
Yes sometimes you have to wait again later, for example if you make a new kind of plot that also takes long to compile. But waiting 15s for an iteration when the previous iterations were fast… That’s not normal, that sounds like this iteration is doing something different from the previous ones, requiring different code to be compiled.

lmiq · May 22, 2021, 1:15pm

Here are some notes, with simple examples of what others have said:

https://m3g.github.io/JuliaNotes.jl/stable/workflow/

https://m3g.github.io/JuliaNotes.jl/stable/modules/

(Setting up a package is one alternative, but a a cannon to kill a mosquito many times)

amit1 · May 22, 2021, 1:56pm

Is there any package to do some sort of ptrace on julia jit? to figure out where it is taking time. The whole StatisticalRethinkingTuring.jl is a DrWatson project and i have a suspicion that might be making it go slow.

Do you have to wait this much everytime you restart Julia?

yes. time varies between 30 sec to 120 sec. I also asked similar question here.

Again, maybe something wrong with precompilation/invalidation which causes Distributions to recompile.

Anyway to check it?

@StatisticalMouse I also did same in Python, where vscode line by line execution is basically a jupyter notebook. so it works fine. But with julia problem was same, these 5-15 sec breaks were quite distracting for me. Another problem with pure REPL based approach was whole global variable vs loop variables etc. I am not totally comfortable with it yet. but getting the hang of it.

@Raf I wasn’t aware Turing was famous for it. I was basically drawn to it for the seamless ecosystem of ML and Turing in julia. Tensorflow Probability was quite unintutive, where I spent more time googling the syntax then actually thinking about my problem.

Ribeiro · May 23, 2021, 7:29am

If the problem is that it takes a long time each time you start Julia, write a script that loads all the packages and runs all the functions you are using. Then crate a sys image with PackageCompiler. That’s what I did and it helped a ton.

Topic		Replies	Views
Ways to make slow/sluggish REPL/interactive development experience faster? Performance repl , ttfp	35	5627	July 23, 2019
Will Julia ever fix its "using ..." latency problems? Performance	69	7254	April 25, 2022
Beginner installing and trying to use JuliaPro on Windows - extremely slow experience New to Julia ttfp	24	7135	April 27, 2018
Can Julia really be used as a scripting language? (Performance) Performance	69	8374	July 28, 2020
Ensure Julia is used to its full power New to Julia question , first-steps	100	6900	December 11, 2020

How to effectively develop in Julia?

Related topics