A Julia DataAnalysis Sysimage from PackageCompiler It's so easy you should do it too!

If you want to use Julia for the kind of thing where you might fire up R and read a couple CSV files into some DataFrames, maybe grab some data from a SQLite file, manipulate the data a little, make a few plots, and be done… Then it’d be nice to have a quick-load sysimage so you don’t have too much “time to first plot”. It turns out that PackageCompiler has gotten to the point where this is fairly trivial. Here are two scripts I’m using to build my own sysimage that can do all these things, including RCall.

Here is the script I’m using as the precompile_execution_file. This file does some “example” tasks to help PackageCompiler figure out what needs precompiling.

using StatsPlots, CSV, DataFrames, DataFramesMeta, SQLite, GLM, Optim, Dates, RCall

df = DataFrame(x=rand(100),y=rand(100),z=Date(2000,01,01) .+ Dates.Day.(rand(Int,100).% 100))

CSV.write("testfile.csv",df)

df2 = CSV.File("testfile.csv")

p = @df df plot(:x,:y)
@df df plot!(:y,:x)
display(p)

p2 = @df df scatter(:z,:y)
@df df scatter!(:z,:x)
display(p2)

h1 = @df df histogram(:x)
@df df histogram!(:y)
display(h1)

d1 = @df df density(:x)
@df df density!(:y)
display(d1)

ols = lm(@formula(y~x),df)
display(ols)


@chain df begin
@subset :x .> .5
@subset :y .< .5
@orderby :x
@transform :p = 2 * :x
end



@rput df
R"library(ggplot2); p = ggplot(df) + geom_point(aes(x,y)); print(p)"

db = SQLite.DB("foo.db")
SQLite.load!(df,db,"foo")
df3 = DBInterface.execute(db,"select * from foo where x > ?", (.25,)) |> DataFrame
df4 = DBInterface.execute(db,"select * from foo") |> DataFrame


And to build the sysimage:

using PackageCompiler
ENV["PYTHON"]="/home/dlakelan/miniconda3/bin/python"
using Pkg
Pkg.build("PyCall")

create_sysimage([:StatsPlots,:CSV,:DataFrames,:DataFramesMeta,:SQLite,:GLM,:Optim,:RCall],sysimage_path="sys_dataanalys.so",precompile_execution_file="dataanalys.jl")

after running all that, and about a minute or two later… I’ve got sys_dataanalys.so so I can do:

julia --sysimage sys_dataanalys.so

and then doing data analysis is quick!

9 Likes

Also vscode

https://www.julia-vscode.org/docs/dev/userguide/compilesysimage/

1 Like

I have made a sysimage with vscode, and it’s dead easy, and works fine, even with a lot of packages in the project. Does anyone know of a way of providing a custom precompile file to the vscode image build process?

How do you deal with the fact that you can’t install or update anything after compilation? Also, do you type the long command with the sysimage path every time or do you make some alias to it so it’s more convenient?

If you make the sysimage with packagecompiler, you can still save it in the folder with your project.toml as JuliaSysimage.dll (or JuliaSysimage.so for Windows) and vscode will detect it when launching its special repl. Of course that will only work for the the projects(s) where you’ve saved the sysimage.

I’m not sure if that’s possible at the moment. @davidanthoff ?

I don’t think that’s right. You can absolutely Pkg.add(“Stuff”) and using Stuff just like with vanilla Julia, but it won’t be part of the precompiled things, so you pay the precompilation time.

If you decide you want some additional packages precompiled into the image, adjust the two scripts to include those packages and some code that exercises the functionality, and rebuild.

Just that adding usually updates dependencies if you don’t pass some flag. I guess I need to try it out :slight_smile:

This is what I thought too, and it was easy enough that I posted this to encourage people. I’ve been pretty frustrated with the time to first plot when you need to read / write some data and do some manipulation before getting the plot in particular (since you pay compilation time for CSV, DataFrames, SQLite, Plots, KernelDensity, GLM etc etc). The thing is at least for me, it’s often very similar types of quick analyses I want to do, so a precompilation execution script that simulates that kind of process is all you need to make sure you get most things prebuilt.

I just use a script called “juliadata”

#!/bin/sh

julia --sysimage /home/..../mysysimage.so