Can Julia really be used as a scripting language? (Performance)

How hard would it be to just compile every package I have installed
locally into an image like that? shouldn’t that ideally happen every time
I install a package?

Maybe somebody could distribute a “batteries included” Julia binary
that uses
PackageCompiler[GitHub - JuliaLang/PackageCompiler.jl: Compile your Julia Package] to allow
a number of the most popular packages to start up instantly. (This would
be aimed at people who simply want to run scripts that they get sent to
them, and who are not bothered by not having the very latest versions of
packages.)

Visit
Topic[Can Julia really be used as a scripting language? (Performance) - #20 by Per]
or reply to this email to respond. To unsubscribe from these emails,
click
here[Julia Programming Language].

The problem is that each Julia function can be called with an infinite number of different types. It is not possible to compile every combination ahead of time. You’d have to make a list of likely call signatures and compile for that.

But every script you know all those types ahead of time (or at least have a very good guess of the types). Maybe a functionality to compile the script somehow.

Sure. PackageCompiler lets you compile into a stand-alone executable if you want.

Even the exe file is not as fast as a file compiled from c/c++. There is also startup time delays with the compiled binary. I wouldn’t use Julia for shell scripts, Julia would look like a turtle next to bash, even with --compile=min. The rapid benchmarks are only established within the REPL after second call without compile-time

True. I’ve sort of come around to the notion that it might be better to ship a package instead.

I don’t mean to say “you’re doing it wrong.” This is a thing that I really want julia to be good at. I’m a pretty hardcore julia fanboy, this is like the one thing that I think is better in other languages. It’s sad because it’s a pretty big barrier for a lot of bioinformatics people that I would otherwise be evangelizing to (well, I still evangelize, but I don’t really have a good answer to this criticism, and it’s a big one).

My understanding is that the primary obstacle to this is the size of the binary, no? I haven’t actually tried it myself.

6 Likes

One should definitely ship a package. If nothing else, then it will be able to set up the project exactly as the sender intended. No surprises at the receiver’s end. Scripts are so fragile!

1 Like

If I ship a package there still needs to be a script that requires/runs
it

I would say it is rather like this: the package provides the script.

2 Likes

If your script takes no user input, yes. But if your script has any sort of interface, I think not really. Again, I’m only speaking from my experience, but I’m thinking a lot of stuff I’d write would be something like

$ my_script proccess some_file1.txt --output thing1.xyz
$ my_script plot thing1.xyz --figure-type bar

Not much harder to do

using MyPackage

process("some_file1.txt", output="thing1.xyz")
bar("thing1.xyz")
1 Like

Haven’t actually tried it either, but as I understand it, yes, the binary will always include the entire Julia runtime. (It might still be a convenient way to let everyone that you share a file system with run your code.)

This is why I suggested that “somebody” should distribute a binary with a bunch of packages built-in. (An extended standard library, of sorts.) That would make it possible to distribute fast-running small scripts or packages with only infrequent updates of a large binary.

It’s not a bad idea, though I predict endless bike shedding about which packages are in the expanded set

Whoever builds the binary (and pays for the bandwidth) would get to decide. But if a package that I need is not included, it won’t prevent my script/package from working. (It will just be slightly slower.)

I think it would be best to make the complete build & CI framework for such a binary available in a repo, also automating the compilation of this “binary” (as a released asset). Then those who need extra packages in it could just fork and modify.

5 Likes

This sounds like a good thing to try.

Is this substantially different in speed from just having a script that first adds packages and then precompiles them which you execute once every so often when you want to update your packages?

I’m asking because I kind of want a batteries included data analysis, visualization, and modeling package set, but it seems really easy to just have a script that does Pkg.add(…); Pkg.precompile() that I run every so often, and I’m not in the situation where I’m shipping a “script” so running this “install script” every so often is fine for me. But if it’s going to be a lot faster when I want to use it to have a built binary, then I’d like to know.

julia can follow similar idea of debian popularity contest package. julia repl can from time to time submit frequently used packages to a central server which can then collate the most popular packages to be included in the sysimage in a fair way. or possibly create a website that accepts list of packages and will automatically create a sysimage that can be downloaded.

1 Like

Why would scripts be fragile? A script with a Project.toml/Manifest.toml and a call to Pkg.activate and Pkg.instantiate inside it, is as solid as a package, in terms of reproducibility, no? The only “problem” is calling it with the “wrong” Julia version, but there are probably workarounds for it.

Not if you either: (1) restrict all the function arguments to concrete types, so there is only one tuple of types the function can be called with; (2) wrap every argument with @nospecialize that is what I do for functions I know that will only be run by the script (are defined inside it) and for convenience I want them to be able to take many different types (that will be passed to Base functions that already deal with those different types so I do not need to write if isa(...) in my code).

Obviously this applies to the methods you define for your scripts. You can only avoid calling functions defined elsewhere with too many different argument types without necessity (like String vs Symbol, or different number types).

1 Like

I was advocating for the same thing that you described here. A script alone (meaning without a defined environment) is fragile. But if the script comes with an environment it becomes much more robust. In other words, your comment is agreeing with me. :slight_smile: