Will Julia ever fix its "using ..." latency problems?

Btw. a popular “misconception” is that Python is not a compiled, but an interpreted language, which gives you the impression that there is no compilation at all. Python has a compiler and it needs to compile the Python code to Python Bytecode before interpretation.
There is actually one situation where you really “feel” the compilation time: when installing packages via pip. If a package does not offer wheel binary distributions (every Python source code is compiled for your target system), the installation procedure has to compile every single file.

You can even see that there is compilation if you remove the compiled files, but of course the effect is not so dramatic as compared to Julia, with a heavy duty compilation chain under the hood. Btw. most of the slow compilation times in Julia come/came from invalidations. The trick is to be smart and not recompile things if it’s not needed, which many people are working on for a very long time already.

Here are some loading times after clearing the compiled files in a fresh virtual environment (find venv -name "*.pyc" -exec rm {} \+):

░ tamasgal@greybox.local:km3pipe  master py-3.8.6
░ 09:34:26 > ipython
Python 3.8.6 (default, Nov  6 2020, 18:54:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

>>> %time import matplotlib
CPU times: user 1.69 s, sys: 124 ms, total: 1.82 s
Wall time: 1.95 s

>>> %time import pandas
CPU times: user 1.41 s, sys: 482 ms, total: 1.89 s
Wall time: 2.43 s

Here is time including launching Python itself (after clearing the compiled files again):

░ tamasgal@greybox.local:km3pipe  master km3pipe py-3.8.6
░ 09:39:41 > time python -c "import pandas"
python -c "import pandas"  2.65s user 0.80s system 89% cpu 3.679 total

As you can see, Julia is not so far off. Given that such commands are executed only once per a session (and Revise for example makes it mostly unnecessary to reload packages, in contrast to Python where you have to reload your complete session every time you change your code), I think it’s quite ridiculous to state that

Regarding:

Julia is faster than Python. Show me one piece of code in pure Python which is faster than Julia, without measuring the compilation time, since that’s not what’s behind a claim of being faster than Python. No-one said Julia will compile faster than Python, as far as I know :wink:

14 Likes

For tasks that are short pieces of simple code that run quickly in python, for example, Julia will probably never be faster the first time you do the operation. But for code that is numerically demanding, taking several minutes or more, Julia will very likely be faster, if the Julia code is written reasonably well. Julia is simply not the tool of choice for quick computations done interactively a single time.

Having said that, you can have a very responsive and quick interactive experience with Julia if you take some simple steps like running a first instance of the types of operations that you will do during the day one time in the morning while you’re having a coffee or whatever. Put those sorts of commands in the startup.jl file, and then leave the terminal open for later use. For example, load Plots, and do a simple dummy plot. The second call to the commands, later, with your real work, will be extremely fast.

1 Like

To be honest, I have to say that this differs dramatically from my experience, so maybe I’m doing something wrong.

When I use Python, more specifically IPython from the terminal, I never restart the session. If I’m working on a script, I can keep running it with %run [-i] [-n] myscript.py and all the entities defined in the script will be renewed. Every time a definition is evaluated, it overrides previous definitions. If I want to get rid of something, I can del or %xdel it. I can see what is defined in the interactive session with %who and %whos. If I want a clean workspace, I can do %reset or %reset_selective <regex> to delete a whole bunch of variables. If I’m working on a module, I can importlib.reload it.

On the other hand, with Julia I have to continually restart the session, even with Revise. The first main reason is because Revise doesn’t work if you modify a structure. While developing, it’s super common to change type parameters, fields, field type annotations, inner constructors, etc. Every time you do so a restart is required. The second reason is that you cannot use the interactive session with the same flexibility as in IPython, because what you define directly in the REPL to experiment is not subject to Revise and you are forced to stick with it forever since there is no way to delete or undefine it. Moreover, when working on a package, every time ] test is run you pay the price of the interminable loading time.

Now, I may be doing something wrong here (and I’m interested to learn a better workflow), otherwise in my opinion there are a lot more situations where you need to restart a Julia session than a Python session.

5 Likes

In Python, the issue is that anything you import gets not automatically updated when it changes. When you are executing scripts, this is not an issue. There is some IPython magic for auto-reloading imports, but I never got this working correctly in Jupyter.

Pluto.jl allows redefinition of structs inside a notebook, and notebook files are just plain Julia files. Furthermore, it works together with Revise.jl for external dependencies.

3 Likes

Pluto.jl is nice, but it has its own limitations (you cannot sequentially modify a variable like you would in a script) and is not suitable for developing a package. Here I was talking about the workflow of developing a module, in the old fashioned way of writing in a text editor and experimenting in an interactive terminal session. For that, Revise is not enough and I keep having to restart the session whenever I touch a struct or a global constant. I don’t even know why they are treated differently from functions; it seems to me that it should be more difficult to update/delete functions. When working on a script, Revise.includet is pretty useless: it doesn’t redefine any data, it only updates functions, so you have to use include anyway.

1 Like

I don’t think that Julia will replace completely scripting languages, because those will be always faster if the loading time is the critical part of the script.

At the same time, maybe you know these details already and people have mentioned them, but sometimes the problems are mostly about finding a good workflow. I have recently posted this in another thread, it is basic stuff and might be useful:

Another tool you can try out is https://github.com/dmolina/DaemonMode.jl , which I think is like Pluto in that it runs code in a newly-created module each time, without re-starting Julia itself.

4 Likes

The “right” solution to that is to work in a module and constantly re-evaluate that. Pluto does that under the hood; VSCode and Juno make it very easy to do manually.
You can also just run your tests in the REPL process, although that admittedly isn’t trivial to get right.

3 Likes

Thank you, jupytext looks promising, I will look into it.

That is what I used to did. The problem is that unlike script execution Shift + Enter does not stop if a command has an error.

For structs I just search and replace the struct name by a new one with a version number, e.g. MyStruct1, MyStruct2, etc. Works well enough.

I can’t repro that with e.g.

println("hi")
println("what's 0//0?")
println(0//0)
println("that didn't go so well")

I confused shift + enter with ctrl + enter. The latter is what I used to do and it has the problem of not stopping on errors and not showing the correct line number in Stacktraces. The shift + enter workflow works better, thank you for the tip.

46 posts were split to a new topic: Redefining structs without restart?

" No-one said Julia will compile faster than Python, as far as I know " - well …

"
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)"

I thought they have something for this too.

Yes that’s true, one should mention that. However, I find it more difficult to work with compared to Revise since Revise just fatally fails when somethings goes wrong and the autoreload feature if IPython often leads to silent errors, which I caused me headaches in past, that’s why I am not using that at all. It anyways does not really fit into my workflow where I need such a feature: quick debugging, where I have to rely on a predictable state.

Anywyas, as said above, yes, there is a Revise pendant in Python :wink:

2 Likes

The statement was precise: compile faster than Python, not run faster than Python. The two are in opposition to one another: the way to make code run fast is to analyze it and implement optimizations (which is what compilation does), and that takes time. So making code perform better at runtime makes your compile-time worse.

The language is interactive and compiled, but the performance statements are about runtime.

I’d rather have it our way, because if you’re running a job that will take days, compile-performance is irrelevant. But of course we should continue pursuing every avenue we can to reduce the cost of compilation.

18 Likes

That one was specifically about using Plots.jl, this one is about using *.

This may be viable when working with scripts (it is still ugly, and the history in the REPL becomes a useless mess), but it’s definitely not an option while developing modules. I’m mostly concerned with that. I find that working on modules is very tiresome because of the constant restartings required. And testing incurs the loading penalty every time too. To be completely honest, my workflow seems to me slower than with a compiled language such as Rust.

3 Likes

In Python you can reload a specific module with importlib.reload(module) which gives you an updated module contents to work with. Now it won’t update any existing objects based on the previous module definition, but it’s still usable for certain workflows.

2 Likes