Roadmap for a faster time-to-first-plot?

From my point of view solving the latency problem for non-package developers first and worrying about package developers later would be an entirely reasonable strategy.

I consistently run into the biggest problems when I bring new users to julia with this latency issue. They come from R or Python, and they just don’t see why they should wait so long for their first plot. 95% of those users will never, ever develop a package. The ones that will eventually, will at that point have seen all the other benefits of julia and I think will be ok to make a compromise on the compile time. But I really strongly believe that the biggest problem right now is for “normal” users that just want to use julia to get some science done.

Here is another way to think about this: for package devs, how would the experience look on other platforms? As far as I can tell, even with the compile latencies, julia is by far one of the smoothest environments for developing packages. Heck, we are competing with other platforms where you have to run a C compiler if you want to produce a fast package. So from my point of view, some latency issues for package devs are really not the end of the world.

But for end users they are, because they can just do their plot and data analysis in R or Python, and it will most likely be faster for them.

And I think a clever combination of sysimages that are integrated into the package manager and operate on a per environment level could essentially solve this 90% for the “casual” user.

52 Likes

I think a reasonable approach could be to have in Project.toml (or Manifest?) a list of packages that should be integrated into the system image. Then one could selectively add or remove packages to the list of packages to be compiled. Since the Pkg manager knows, when it makes an update, it could then trigger a recompile so that nothing gets out of sync. When dev-ing a package it would automatically be removed from the compile-list.

My only concern (but maybe that can be sorted out) is that currently not all packages are compilable. At least I always run into trouble when trying out PackageCompiler (e.g. on Gtk.jl).

7 Likes

Sure, that is all completely fine and true. I’m in no way arguing that users shouldn’t have bigger precompiled system images to reduce latency for their needs. I’m just responding to this post: Roadmap for a faster time-to-first-plot? - #7 by dlfivefifty which specifically called out different kinds of latency.

AFAICT, all you need to do is put using Plots (etc.) in userimg.jl, and perhaps some tooling to make that even easier. That is of course a totally different kind of thing from making the compiler faster, and can be worked on in parallel.

3 Likes

When I was reading this post, this was not the conclusion I was expecting. Everything before this seemed to me to support tiered compilation. The thing is that even with fully pre-compiled packages and environments, it’s really hard (impossible, actually) to predict what a user is going to do, which means that you cannot realistically have it all compiled in advance. You could reasonably do that for an application that you can trace beforehand, but not for arbitrary user input. And especially a first-time user isn’t going to be in a position to run code and create a trace or even make an environment before wanting to plot something or load a data frame. As far as I can tell, the only strategy that will provide the experience you describe of interpreter-like latency for first-time users is using an interpreter and doing compilation in the background.

8 Likes

I might be wrong, of course. But my sense is that for packages like DataFrame.jl, various plotting packages, file loading packages etc. one can essentially (as the package author) specify which methods should be baked into a sysimage, and by that cover a huge, huge number of use cases. Yes, it won’t work with arbitrary custom types, but heck, most users are loading CSV files with like at most four different types. Same for plotting packages etc. It won’t solve this problem completely, but I think it could go a very long way.

5 Likes

For me it’s mostly PhD or Masters students that are the concern, not “users”. They pretty much need to use Julia and ApproxFun in their research, but some of their machines are extremely slow: several minutes instead of 20s. If PackageCompiler.jl was just a bit more refined, then I could help them set it up as a one-off.

Anytime they do something not precompiled it will still be slow, but that can be lived with.

1 Like

Have you tried telling them to put using ApproxFun in userimg.jl and type make? Granted, it’s not the most elegant thing in the world, but I’m wondering if something about it doesn’t work.

2 Likes

This would require them being able to build Julia from source. These are maths students, not comp sci, so that’s kind of a big leap.

And since these clunky slow machines inevitably are running Windows, I also have no clue how to do it.

Though I’ll play around with whether ApproxFun is already PackageCompiler-able, which combined with Makie might suffice.

1 Like

If they’re too scared, you can type make for them :slight_smile: But ok, yes I see windows makes it harder. I don’t think this requires a full source build though; the build_sysimg.jl script we used to have (and PackageCompiler) can work without it, though it still needs a C compiler. I hope PackageCompiler can be made to work.

6 Likes

did putting using SomePkg into userimg improve a lot in 1.1? before I started writing PackageCompiler, I played around a lot with that and tried many different packages and the improvement was usually insignificant…

2 Likes

Frankly, if someone is doing just this, there is no reason whatsoever for them to use Julia. They should just stick to R and be happy.

I think it is a misunderstanding to assume that

  1. everyone should be using Julia, and

  2. the core team should prioritize improving the Julia experience for people who may not have a good reason to use Julia at all (at the expense of other goals).

Julia has a lot of benefits, many of which are realized by emitting aggressively optimized code. This also entails various costs, such as compilation time, learning new concepts (parametric types, multiple dispatch), tools (profiling, benchmarking, investigating performance issues).

The user should weigh these carefully and decide whether Julia is the right tool for them (and it is fine to say no). The cost-benefit frontier will likely improve as the language matures, but will always be relevant.

5 Likes

Your comment is basically saying “Julia does not and should not solve the two language problem”, which seems to go against the mantra of Julia.

I assume JuliaComputing is prioritising whatever their paying customers are asking for, as they should be. But I really hope they still intend to work on truly solving the two language problem, by making Julia a viable replacement for Python/R/Matlab for interactive use as well as HPC.

13 Likes

I wonder if it is reasonable to devote (very expensive) developer time to improve latency on ancient machines (which would be replaced anyway at some point). I am using a 5 yo laptop, track master and all packages daily, and I never have to wait more than 20s for the first startup. A 120GB SSD can be obtained for €25.

2 Likes

I don’t see how. I think you misunderstood something (perhaps you can explain).

This say to me “if you want interactive code, you should just continue to use a second language which is better at it, and Julia developers shouldn’t prioritise improving Julia for such use cases.”

7 Likes

Thanks for clarifying. I think you misunderstood: my point was that if one is doing something which just involves a few primitive types and some well-defined operations (simple dataframe manipulations and plotting), they should be fine with, say, R, and may not benefit much from Julia, which at the same time has a lot of implicit and explicit costs.

Conversely, Julia’s comparative advantage is not in competing with well-established tools in which most users just write simple scripts that glue together existing functionality (Stata, R, …). Making this fast is always nice, but given the limited resources (especially time of core developers), I would prefer different priorities. YMMV.

3 Likes

I fully support this statement. AFAIK there has been a huge leap into this direction with the new debugger (based on an interpreter, I guess) – correct me if I’m wrong. You could, for example, always use the interpreter for the first evaluation of something, and start compiling it right away in the background. And/or let the user completely disable compilation if he’s (currently) working very interactively (and the working pace of humans is way slower than an interpreter).

Python (running on CPython) is immensely “responsive” / interactive because it is fully interpreted. Try compiling parts of it using Numba or similar packages, and you’re i) waiting a long time and ii) lose all of Python’s ease-of-programming. For me, waiting a second now and then (Julia) is not as bad as my code running extremely slow (Python).

6 Likes

This, I don’t get.

What if the professor and other members in the group use Julia? Should naive users develop their own codebase in R, even if no-one else is using R, and they themselves don’t know it?

I’m not clamoring for the core devs to devote their time to this, but it seems strange to basically imply that Julia is only (and should only be) for ‘sophisticated’ users.

14 Likes

It’s probably easy, but I think the issue is, for many (and also for me): Where should i type make? I know, on the keyboard. But, where do I put the letters on the computer screen?

This is not a real question on my own behalf. I can figure this out by reading some docs. (But at this moment, I would have no idea how to accomplish this.) I wanted to get across that for many users (even people who are quite familiar with programming in general, and Julia in particular), the statement “put something in userimg.jl and type make” hides a large amount of required knowledge and understanding.

20 Likes

That’s a good point. If used in the context of a course and time-to-first-whatever is an issue, someone (ideally a TA, or someone from IT, but practically at this point the instructor :wink:) should set up a precompiled system image that makes everything fast(er).

3 Likes