I’ll reiterate again that I suspect you might be focusing your efforts in the wrong direction. Pre-compilation will speed up the start time of the julia image for the first time, any subsequent execution will not be any faster. Ie, if you call that function many times, you won’t see much difference. You might be better off pursuing some alternatives - more on that at the bottom of this post.
Re. your error,
ERROR: package(s) LoopVectorization, TensorOperations not in project
This error probably relates to your environment management, ie, what packages do you load / from where (more HERE)
Assuming you want to have a project-specific environment (healthier approach):
- You either need to start your julia repl with “–project=.” flag to activate the environment defined in your folder (you’d see file Project.toml in your folder that would mention the packages you use),
- or you need to start your code with
import Pkg; Pkg.activate(".");
- or you need to use
]activate .
in REPL.
The best way is to test if you can run your script without a problem, then you want to compile your project the same way, eg,
Assume I have the environment with all packages (ie, Project.toml file is present). If not, run julia --project=.
then ] add XYZ
I would have my script, let’s call it mwe.jl
:
using LoopVectorization, TensorOperations
function test7(A)
@tensor B[i,j,k,l] := 0.1*A[i,j,k,l] + 0.2*A[j,i,k,l]
B
end
# you need to also execute your function for it to be compiled for the right types
A=ones(Float64,(30,30,30,30))
test7(A);
You would start Julia REPL with the compile tracking and execute the above script
julia --project=. --trace-compile=my_precompile.jl mwe.jl
As a check, you can open the my_precompile.jl and see that many function calls were recorded that should be compiled.
Then I would start julia REPL and run the PackageCompiler (remember that you need to add it as a package too!):
julia --project=.
using PackageCompiler
PackageCompiler.create_sysimage(["LoopVectorization","TensorOperations"]; sysimage_path="my-sysimage.so",
precompile_statements_file="my_precompile.jl")
Everything should be done, so you can test your system image now:
julia --project=. --sysimage=my-sysimage.so mwe.jl
It should start much faster, since it’s all precompiled.
To use it in python, you’d link your my-sysimage.so
file in the sys_image keyword
from julia import Julia
>>> jl = Julia(runtime="PATH/TO/my-sysimage.so")
I’ve had problems with it in the past, so you can use the low-level API (see examples HERE). It’s also on SO (see StackOverflow issue)
Re. your overall strategy, I’ve benchmarked the code on my PC (M1 Pro running on Rosetta):
@btime test7(A) setup=(n=30; A=rand(Float64,(n,n,n,n)));
# 367.375 μs (289 allocations: 6.21 MiB)
@btime A=rand(Float64,($n,$n,$n,$n));
# 722.583 μs (3 allocations: 6.18 MiB)
# time it takes to create a random array of that size is 2x long
@btime A=ones(Float64,($n,$n,$n,$n));
# 106.000 μs (3 allocations: 6.18 MiB)
# even creating plain array of 1s takes 1/3 of the time (no computations!)
This tells me that your code is too fast comparatively to memory operations (data movements). That’s probably why moving stuff across from and to Python hides all Julia benefits.
Therefore, I’d propose:
A) Move more of your code to Julia
B) Run your python code in Julia (via PyCall) and leverage the Julia speed for your kernel
More on option B), I’ve added PyCall package to my Julia project, which allows me to test the Numpy code on equal footing and execute arbitrary Python code from the Julia side:
using BenchmarkTools
using PyCall
# bring your script
include("mwe.jl")
# define the python code (you can also import all your code as one script/one function)
py"""
import numpy as np
# this is example python object
x=np.ones((30,30,30,30))
# this is example python function
def my_func(A):
return 0.1*A + 0.2*np.transpose(A, (1,0,2,3))
"""
# benchmark with a Julia array `A`
# notice the `$A`, you need to interpolate your Julia variables as arguments into Python functions
@btime py"my_func($A)" setup=(n=30; A=rand(Float64,(n,n,n,n)));
# 2.318 ms (41 allocations: 6.18 MiB)
# benchmark with a Python array `x`
# I had some doubts if it will memorize the outputs, but judging by the results it runs it properly (ie, no setup=... needed)
@btime py"my_func(x)";
# 2.517 ms (35 allocations: 6.18 MiB)
You can probably see the huge difference in running the Python function vs Julia function on equally-sized arrays (370 microsec vs 2300 microsec)
Hope it helps.