I don’t quite understand why the “scripting” cannot be done from within Julia rather than outside of it? I can see how running bash and firing up julia instances can be suboptimal, but I have successfully performed traditional scripting tasks in Julia. Much better than bash, imo.
That would require the users of the scripts to know about julia, and be
able to write julia syntax to invoke things.
OK, but then why is this a problem with the language? It is not used optimally, but that is then given by an outside constraint.
I think it’s fair. I don’t think there will ever be a world where users won’t have to know a little bash. It’s sort of the lingua franca in programming. If you want a script to be as widely usable as possible, it should be callable from the shell
OK, but then why is this a problem with the language?
It’s not. My OP asked if if were possible to use julia for this case now
(which it sort of is, with compile=min) and if this use case was in the
target space for the language eventually (there is evidence that it is).
If the answer had been “no, that’s not what julia is for” it would be
fine of course, I would just have to look elsewhere for my needs.
BTW: my usual scripting tasks (prior to Julia) were of a nature where whether they ran for 10 seconds or 30 seconds did not make a difference. It usually took me much longer to come up with the correct script (in bash) and saved me tons of time anyway. Just my 2cw.
(I myself use bash scripts frequently, so don’t take this as bashing bash).
But this is actually a slightly self-defeating argument.
*If your script takes long, then the startup of julia doesn’t matter,
*If your script is so quick (<50ms), then your system should be idle the whole time and the few extra ms should not make a difference, unless your calling it very frequently.
*If your calling it very frequently then you can consider doing the higher looping script in julia and then again the startup doesn’t matter. Then you even win on launching bash. (You can still keep the julia code modular and reusable in separate files, while with bash functions it is not as easy as far as I recall).
I don’t take that as a criticism
Everybody can take what he personally thinks is right. I like Julia as a language but only within Juno or REPL, not for something you compile quickly and show someone on another machine without Julia there.
Sort of… If you do
using X and it takes 4 minutes to precompile and the script is going to be used once to do something like grab a bunch of data files off the web and create some very simple summaries, and bash + some awk or whatever could do it in 800ms then Julia obviously loses.
Julia only wins when the script is reused multiple times, like maybe it grabs those data files daily and calculates some updates, and only on the first day does it need the precompile.
Julia only wins when the script is reused multiple times, like maybe it
grabs those data files daily and calculates some updates, and only on the
first day does it need the precompile.
In my experience so far, without compile=min I pay tens of seconds of
compile time on every run of the script, not just the first run.
If it was just the first run that was slow and then fast after that (next
day, next week, etc), that would be fine for almost all scenarios (except
throwaway scripts, as you mention)
AFAICT, this thread seems to involve some misunderstanding between parties: it seems like there’s a group of people who often use Julia on local machines or the machines of their close collaborators and another group primarily interested in shipping what is essentially the dynamic language equivalent of an executable binary. Probably important to explicitly call out with each suggestion what usage scenario you’re optimizing for as the requirements are very different.
When I open a new Julia REPL, and say
using Queryverse which is a package that has lots of dependencies which it loads in as well, it takes about 20 seconds. It does that even with --compile=min. This is not compilation time it’s just loading the precompiled packages. It doesn’t give the “precompiling” message, as that was done the first time I loaded it… Probably that was minutes of compile time.
So, yes 20 seconds of load time is quite a bit, but if I’m downloading a batch of files and generating a 100 page PDF report with graphs and charts nightly in cron, it’s not a big deal. But if I’m going to have a user invoke a script, and they’re going to sit in front of this in order to get a single page pdf plot… then it’s pretty annoying. They just want to see that one graph… and it takes tens of seconds.
As far as I understand, just because the package is precompiled, that does not obviate the need for more compilation when a package is loaded. And potentially more when functions are called for the first time.
Actually… it probably is. More specifically, it’s probably recompilation time. Loading new packages can invalidate the optimizations that were used to first compile Julia itself (and the precompilation in other packages). This is a large part of the time-to-first-plot and lots of effort has gone into reducing invalidations in 1.5 and beyond.
Ok, thanks for clarification. I guess what I’m saying is it’s not the same time that is spent when you first precompile the package. But then this explanation suggests that for people looking to write scripts using certain common packages, one should ship a full binary with those packages preinstalled using PackageCompiler so that the assumptions made during compilation time take into account all those packages that are preinstalled?
For an example, suppose you’re going to use some “scripts” to let people grab up to date data displays with continuously updated data. They can run
and it’s actually a julia script
and it downloads a bunch of data from some database and graphs 10 pages of visualizations. Let’s say the data is updated continuously (say it’s stock trading data or hospital resource usage data or logistics data on warehouse stocks and supply chains or something).
You want anyone to be able to run this thing and grab the latest visualizations and it should run quickly. Suppose you want to use certain packages like Queryverse and DSP and GLM and whatnot to develop your data displays. For this usage, it seems reasonable to ship a julia system image that includes those packages using PackageCompiler… at that point. I would HOPE that the startup time would be less than 1s before the julia code started executing the queries against the database and soforth… so that if the graph generation time was 3 seconds, there would be no 30 second recompile times involved every time you invoke the script. Is that correct?
Yes, that’s precisely what PackageCompiler will do. Even better, it can give you a relocatable bundle that includes all artifacts and dependencies and Julia itself with a single point of entry.
Were I putting together something to ship, I’d do it as a package. Note that packages have facilities to do even more precompilation than the default. If I really needed to cut down on startup time, I’d use PackageCompiler. Both of those are great as they will automatically install/bundle dependencies.
If you really want to stick to a script paradigm, you could look into using
Base.Experimental.@optlevel 0 (on 1.5) to do more finely tuned versions of
-O0 that get included as a part of your script.
I should emphasize that lots of work has gone into improving this latency on 1.5 and it’s continuing to improve.
I get much faster for only VegaLite:
$ time ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia -O0 --compile=min --startup-file=no -e "using VegaLite" real 0m1,928s
Queryverse is for Vega-Lite (and more, it’s a meta package), and if you don’t need all the options, consider doing
using separately for each of the dependencies of this meta-package you actually use (it seem such meta-packages are for convenience, more appropriate for interactive use, to include the “kitchen sink”).
I’m not up-to-speed on these *Builder, and JLL packages, but I think they promise binary dependencies, as fast as in Python. I’m not sure if it’s used, or what to make of the 2.4 version number (higher than 2.1.3 for VegaLite.jl):
I took a look at the equivalent Altair Python API for Vega-Lite.
The overhead for Julia including startup needs not be more than:
$ time ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O0 --startup-file=no -e "using PyCall" real 0m1,047s and then I did: @time py""" import altair as alt # load a simple dataset as a pandas DataFrame from vega_datasets import data cars = data.cars() alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin', ) """ 0.603054 seconds (225 allocations: 10.531 KiB) [strangely, a bit faster than the same in Python 3, minus its startup overhead.] $ time python3 vega.py real 0m0,735s before I did: pyimport_conda("altair", "altair") pyimport_conda("vega_datasets", "vega_datasets")
I like better the Julia syntax (see same example):
I’m not saying we should give up and use Python for all your scripts, or use together (while in some cases an option). I hope Julia with binary packages (eventually) loading them can be as fast.
Since it’s not as fast currently, I’ve been thinking, could we compile Julia programs to e.g. Python, Perl or Go or Java? By now unmaintained Julia2C showed it was possible, and having a garbage collector in the target language would make it less difficult, reusing the one it has, and make source-to-source possible, while still being able to run as a script (I find it desirable that code needs not be a binary).
i’ve been thinking for a while that for interactive sessions, it would be nice if two threads were launched everytime a method needed compilation. the first would execute it without compilation for a responsive user interface, and the second would compile it in the background for use the second time the method was called. does that make sense?
much better than the module-level optional compilation coming out in 1.5 i think.
If you are running Julia on a unix system , either Linux or Mac
Please consider DaemonMode when running the Julia program as a scripting language.
One problem would be if the method in question takes a very long time to run. You may have to wait hours for the interpreted version to complete, even though the compiled version would only have taken minutes. (And you can’t simply kill the interpreted method and run the compiled version instead, because the method might have side-effects, such as writing to a database.)
A simple heuristic could be to run methods that contain no backward branches interpreted (until background-compilation finishes) but any function that has a loop still gets compiled before the first run.
Each time a compiled function calls a maybe-not-yet-compiled function there would have to be a branch to check if the compiled version is available, and that would slow things down a bit, but probably not as bad as doing a full dynamic dispatch…