Ensure Julia is used to its full power

Obviously I usually store things in functions, but maybe I’m making a little confusion here because even if I left my files unchanged, calling Julia on it from the terminal would always have to pre-compile, right?

Because that’s what I feel like happens when I call Julia from the terminal, the script seems to take as long on the first run as on the second run, hence it’s compiling again even without performing any change.

If you transform your set of functions into a package and install that package, then Julia will store a precompiled version of the package which is the one that is going to be “used” (with using). Thus, most of the functions will not get compiled again. Still there might be some functions of the package that need to be recompiled, if for some reason precompilation is not possible (see: Finding and fixing invalidations: now, everyone can help reduce time-to-first-plot).

But essentially only the script will be compiled, and not all the functions of your package, everytime.

2 Likes

This was mentioned earlier in the thread but Daemonmode.jl solves this exact problem.

3 Likes

Well the content of the script is definitely compiled again, because that’s what you tell it to do when you type julia myfile.jl in the terminal. (iirc) Julia won’t precompile scripts as it would with modules.

Creating a module for your simulation functions and then using Daemonmode.jl will probably make your experience much smoother. Feel free to report back with your experiences for other people with the same problem :slight_smile: and have a nice day

There are many ways to avoid repeated compilation. One of the simplest (without creating your package, or using any additional packages) would be a workflow like this:

OS prompt> julia
julia> include("path-to-script/myscript.jl") # Contains function descriptions only, nothing executable. May take a while to compile
julia> myfoo(x)
ERROR: UndefVarError: x not defined
julia> x=1.2
1.2
julia> myfoo(x) # on first execution, may take a few additional seconds to compile
3.4
julia> myfoo(2x) # this time it runs really fast
4.6

This reply is sort of unfortunate. The rest of the thread has lots of replies really trying to answer the OP’s question. The OP is curious and courteous. Your reply is also entirely courteous, but its content seems to say, “Don’t use Julia the way you need to; use it our way…”

Nearly all of the other replies show considerable sympathy with the needs for easy startup (into using Julia) and easy startup (into using other packages). These needs are well understood; efforts to mitigate these needs are prodigious (!); and most importantly, respect the OP’s goals. It doesn’t need to be repeated that Julia developers/maintainers are sincerely interested in reducing (some) obstacles to Julia adoption. OP is sympathetic to Julia being early in its evolution (with remarkable achievements already) and less mature in some of its “creature comforts” than other solutions.

The thread in its entirety is really interesting and helpful. The posts that offer some concrete solution–even if only a partial solution, given the technical realities–probably helped the OP more.

2 Likes

Yes this thread definitely blew up, and I feel kind of bad because I haven’t been able to keep up with all the information and extra links as I’m still getting started on most of the concepts myself, but everything provided so far was remarkably interesting and these are things not discussed in the “Introduction to Julia (for programmers)” that is available on JuliaAcademy and considering how different Julia does things compared to the languages that I know everything helps.

I also understand that using different tools sometimes imply using a different workflow (for instance when moving from Windows to Linux installing things the “Windows way” will only make your life difficult has there is a package manager for that) I’m just trying to understand what I can or can’t do with Julia (but seeing multiple testimonials online of people with the same use case as the one I’m planning and being successful with it then clearly if changing workflow is a requirement then I wouldn’t mind).

2 Likes

I hope you didn’t find my initial “you’re holding it wrong” reply too brusque! The startup/compilation time is indeed painful compared to Python, and it’s a sore point for Julia advocates, all the more so because it’s much better than it used to be thanks to tons of effort by both core & package developers. But that doesn’t make the startup lag feel any better to you as a newcomer, since you don’t have any basis of comparison except the (fast-startup) language you’ve been using. Until we can provide a snappy, seamless out-of-the-box experience, we need to do a better job of onboarding & providing reasonable expectations for newcomers like yourself.

Thanks for taking the time to describe the workflow bumps you encountered–it’s important to get that sort of feedback.

3 Likes

I don’t mean to dispute your comment, but WRT concrete solutions DaemonMode.jl (mentioned in @stillyslalom’s post) seems to be quite relevant and an oft-quoted technique for running scripts?

On a meta note, I hope I’m not getting desensitized because this feels like the most constructive/least defensive or confrontational popular (orange reply count) topic within the past couple of months. Goodness knows 70 posts without some conflict is not a high bar, but as you mentioned the overall more empathetic and accessible approach is heartening.

No need to feel bad, posts about latency or language comparisons tend to hit a nerve or “nerd snipe impulse” and you’ve done an admirable job keeping up with all the replies! There is plenty of (well-founded) enthusiasm in the community, so sometimes it’s hard to hit the brakes and give things time to breathe :slight_smile:

1 Like

Nevertheless, even if the startup/compilation/invalidation issues are solved and mitigated to a large extent, some lag will always remain compared to a scripting language — the compiler takes time.

IMO it is good to be aware of this — Julia can be used for scripting, but will never be a “scripting language” like eg Perl or Bash. So I would agree with @stillyslalom — to make the most of Julia, adapting the right workflow is crucial. Otherwise it will be a somewhat frustrating experience.

1 Like

As a side note, in that course I am teaching, more or less when this thread came out I decided to show the students the workflow using modules and Revise. I didn’t before because the programs were simple and reloading scripts seemed to be fine. The reaction of many of them was of joy and they felt that everything was much smoother after that. Next time I will probably explain that development workflow as the very first thing.

6 Likes

I’ll add because it’s been so incredibly useful for my simulations: if your simulations involve differential equations, then take a look at Parallel Ensemble Simulations · DifferentialEquations.jl (ensembles to automatically run differently-parameterized simulations in parallel!)

(For the generic case, all the above answers get to the basic message of don’t close julia and put your script into a function)

Yes, this is the nature of a JIT compiled language even if maybe some packages, in the future, might be statically compiled. My point was to encourage response to the OP that might help meet his needs, if imperfectly. I think most of the thread did so, even the person whose post I commented on—and we had a PM “making up”.

(oh dear, coming back from the bottom of this post, I really wanted to be silent but could not resist to get it off my chest)

I am always wondering about statements like this and others like “Python has solved it with NumPy, Numba and JAX”. Another point is “modern Python in worst case could be slow, but not more than 2-3 times”. What is “modern Python”? I think you are referring to “modern Python” as the full stack of all those libraries and technologies which try to fix the root of the problem: Python is not a scientific language designed for high-performance computing.

I think that the myth of “modern Python” in that sense is a lie and I am one of those who still try to believe it’s true, but I am starting to give up on that hope after about a decade of scientific HPC Python experience. I maintain a couple of low level libraries which are used in astroparticle (neutrino) physics, where we deal with Petabytes of data (MC simulation and detector data) and I can tell you that in the past 8 years or so, we went over so many iterations, combinations and transitions of our tools, from Cython to PyPy to Numba, C++ interfaces, wrappers and whatnot, mixed things here and there. Fact is: everything is changing constantly all the time and we are chasing and trying to keep up with API changes and new kids around the block and nothing is working at its full potential. The code which worked for Numba vX.Y break without any good reason in vX.Y+1. Then we face compiler issues because some library maintainer decided to drop C++14 support but we need to link against another which is C++14; incompatible boost Python bindings, Python 2 vs 3 nightmare with str/unicode/bytes, Windows users who struggle to compile Cython; and I don’t want to start to talk about our ML stack with PyTorch and Tensorflow, oh dear…

No: NumPy is not a solution for complex, real world problems, it’s part of a “solution” which has to be combined with many other things, like Numba or C++ interfaces to other low level libraries, at least in the context of “complex libraries”. And no, Numba is not a solution either and it’s not at all like Julia, yes both use LLVM but I guess that’s it. The type inference is different and the whole featureset is extremely limited. It works nicely with toy examples but it gets extremely complicated and unusable/unmaintainable with complex structures. It doesn’t even support recursion, only in a very limited way. Dictionary support is buggy and unpredictable and the error reporting and debugging in general is horrible(!).

I am not impressed at all that Numba can speed up something as simple as (taken from Examples — Numba 0.52.0.dev0+274.g626b40e-py3.7-linux-x86_64.egg documentation):

@jit(nopython=True)
def mandel(x, y, max_iters):
    """
    Given the real and imaginary parts of a complex number,
    determine if it is a candidate for membership in the Mandelbrot
    set given a fixed number of iterations.
    """
    i = 0
    c = complex(x,y)
    z = 0.0j
    for i in range(max_iters):
        z = z * z + c
        if (z.real * z.real + z.imag * z.imag) >= 4:
            return i

    return 255

Even if it manages to use complex numbers. We need to process millions of hits from different types of PMTs and apply timing and position/rotation calibration for each of them while they are housing in different optical modules which are again connected to different detection units and synchronised via GPS clocks to each other. Event triggering needs to be adapted to the constantly changing PMT rates and the trigger algorithms are based on complex clustering and ML (deep learning, NN) algorithms. Numba’s documentation is awful and really not designed to be extensible. This changed a bit over the past months but it still makes it almost impossible to make it work with a well structured class hierarchy.

So you end up with a combination of all those above mentioned technologies which makes Python so good in scientific computing. Of course you have to maintain low-level API compatibility to dozens of software written in different languages with a very dynamic development cycle (this is definitely true for many scientific Python packages) and whenever someone jumps in, they are facing a multitude of technologies and the project feels like a house of cards, even with >90% code coverage and an as restrictive dependency change policy as you can imagine (which btw. slows down the development process immensely).

We have a somewhat working solution with a very fairly complex piece of software combining technologies/libraries like Numba, pybind11 with C++ libs, numexpr, Cython, awkward arrays (jagged arrays which are mandatory in particle physics, based on Numpy) and a lot of tweaks in each of them, many times with calls to unstable or private functions which break here and there.
I can tell you that it’s a nightmare to maintain such packages and the worst thing is that whenever a user needs more than the in-house implementations/solutions and want to try something new, they need to understand all those connections, tweaks and hacks und also grasp what Numba, JAX or whatever fancy low-level library is capable of without catapulting the performance orders of magnitudes to the roof.
…and still, if someone needs to write high performance code, they need to switch to our C++ libraries because we need more than arrays of int32 and float64 etc. Our DAQ framework written in C++ has several hundreds class definitions for detector components and signal types. So don’t say that Numba and NumPy will help to implement algorithms which deal with those… That would be a nightmare.

I am trying to build a parallel world in our astroparticle community with Julia software and show how nice and transparent a software can be: written in a single language without the feeling that it’s a patchwork thing of multiple languages and approaches, and the performance and usability in general is always better than our Python-wrapped solutions (which are in fact only high-level interfaces to a multitude of non-Python code) for many reasons. The above mentioned example with the hit-processing can be nicely modelled with hierarchical types for PMTs, hits, calibrations and the code is self-explaining. It runs on macOS (our DAQ software does not), Linux and even Windows (no DAQ either as you have guessed). Of course it only covers parts of the actual functionality but you can do both low-level analysis and high-level analysis. I know what I am talking about since I have written Julia software on both ends: real-time neutrino event reconstructions and also several high-level analyses on computing farms and grids with multi-terabyte datasets based on the ROOT dataformat.

One of the main problems in my opinion is that it’s hard to convince people to switch from a popular language, where many things are kind of “OK”, even if in fact, they are definitely not “solved”, but patched to death and whatever works, to a language which solves those problems whose existence they kind of refuse. Most of the time people don’t even know how far they are from the actual performance of their hardware and they simply accept that some things are barely possible. Python got momentum and we have solutions to make code run fast, but the reality is that only experts who understands those supplementary libraries are able to write performant code for complex operations. I constantly teach people how to use our software, even if it’s as user-friendly as we could design it: Python is a dynamic language and you can easily leave the performance-zone without noticing it, when you are not familiar with the underlying concepts. So many times students come to me and tell: that worked with the small set but now 5000 jobs crashed on our computing grid because of memory error (hello pandas.DataFrame concat) on all nodes or due to a naive Python implementation because they did not know better - it worked with 100000 events, so it will surely work with 10^9 events, it will just take longer…
In Julia, such surprises are less likely, even if “the global scope” is one of the main pitfalls newcomers are facing, which definitely has an impact on performance, yet, naive code usually runs as fast as the algorithm dictates. That being said, I am not talking about “bad” algorithms, for sure they fail (to scale) in any language.

I still don’t how to deal with this (Python, the great attractor). In my opinion it feels like a train that you cannot stop and more and more people are jumping on while it’s getting faster and faster. Whenever I talk to people who are really involved in heavy optimisations of complex libraries which are mainly used in a Python context, the general consensus is that it’s really neither fun, nor easy and on top of that, the users are struggling constantly and are often offered only a subset of the full feature-set.
But we (including myself) are still doing it because there is demand. We keep adding high level functions for implementations that the end-users could not manage to hack with a reasonable performance.

45 Likes

Interesting, although studying physics I’ve never encountered anybody that put stuff like that, I’m guessing I’m not in such a computational-required university and most of the focus is on analytical rather than numerical. It’s still quite surprising to me how much physics is done numerically and how much we rely on programming.

Out of curiosity, considering you’re a physicist, did you learn Julia outside of university or was it something that it was being taught there?
I feel like all of that knowledge that I have on the matter, not only Julia but of how computers work and standard pitfalls (as you mentioned with the global variables which was mentions on the performance tips on the Julia docs), isn’t taught at university (for me, a physicist, obviously), and the ones I’m familiar with are the product of googling around, which although convenient doesn’t introduce knowledge in a structured way which makes me feel that I don’t know how to think and all I did was memorize a few things here and there.

If you’re using Julia in such an advanced way, is there any go-to books or guides on it? Because as I said in the beginning of the thread, I met Julia “by accident” and now as I’m looking around if feels like it can actually be something that I might take advantage on the future.

Also thanks for your feedback, I barely even use external libraries on Python but I do admit that usually it’s the first thing I try to do is google “packages to do X in Python” and I had no idea that, in the long run, I might have ended up re-writing the entire code just because I couldn’t do something that I really needed that couldn’t be added in with an external library in Python.

3 Likes

I don’t have a book to recommend, but there are a few general topics that can take you a pretty long way for high performance computing. The two most important topics at a low level for good performance are probably cache coherency and vectorization/simd. The very TLDR of this is that processors are much faster than memory, so storing your data efficiently matters, as does traversing it in order where possible (or even better, traversing it less). At a language level, it gets a little more complicated but most of the time it all boils down to asking how you can do as little as possible at run time.

If you are looking for Julia specific tips, I would highly recommend many of the videos from JuliaCon. You can also learn a lot by poking around the code of various Julia packages (or Base).

3 Likes

Hmmm…

“…Google breaks AI performance records in MLPerf with world’s fastest training supercomputer… We achieved these results with ML model implementations in TensorFlow, JAX, and Lingvo. Four of the eight models were trained from scratch in under 30 seconds . To put that in perspective, consider that in 2015, it took more than three weeks to train one of these models on the most advanced hardware accelerator available”.

See details: Google breaks AI performance, July 29, 2020

3 Likes

TensorFlow is written in C++ (61% C++ vs. 26% Python code, see https://github.com/tensorflow/tensorflow).
In Python is “just” the API for easier usage, but none of the performance critical parts.

6 Likes

I started programming almost 3 decades ago so for me it was kind of normal to learn that by myself, so did I with Julia and other languages. In our university, the coding courses for physicists were basically non-existent when I started studying in 2006. There was a “computational physics” course but it was extremely basic (reading CSV files with C and doing some low level calculations) so I skipped that completely but luckily the computer science department in Erlangen is at the same location so I was able to visit some advanced courses on hardware architecture and algorithmics etc. But those were only theoretical courses with examples in Java.

In 2018 we started to introduce a Python course for physics students which is now mandatory. It’s a mix of Numpy/SciPy and Matplotlib. I already see the impact in e.g. the electronics lab course which I supervise every year: students are working with Jupyter notebooks instead of Excel ;)
For next year I will try to squeeze in a Julia course.

Well, since I (and I feel that the majority of the Julia community) know multiple languages and already spent a lot of time in studying computer science, the learning path for Julia is different.
I think you should grab a book about some computer science fundamentals and then just work through a few Julia tutorials while studying the docs and reading a lot of code from others (like packages you use). I highly recommend the @edit macro in the Julia REPL (cf. @edit atan(2)) to check how a specific method is implemented. It brings you right to the source code and you can look around.

What I found really interesting is “Algorithms for Optimization” from Mykel J. Kochenfelder. It’s not a cheap book but it’s a very nice collection of algorithms written in Julia. Other than that, David P. Sanders hands-on courses are also really neat, for example Introduction to Julia - Part 1 | SciPy 2014 | David Sanders which is a bit outdated but is still a well structured introduction in my opinion. The https://www.youtube.com/user/JuliaLanguage YouTube channel is a great source if you like watching videos and recently Grant Sanderson (3Blue1Braun on YouTube, very high-quality content!) started to post videos where he uses Julia and Pluto to teach mathematical and computer science concepts, like here, about the Discrete Fourier Transform: https://www.youtube.com/watch?v=g8RkArhtCc4

I hope this helps; you have to find your own path, so get inspired ;)

15 Likes

Very interesting post! It would be great if you could convert it into a blog post to share it to a wider audience :smiley:

2 Likes