Thank you so much.
julia --trace-compile=my_precompile_recipe.jl precompile_plots.jl
works. Then,
using PackageCompiler
create_sysimage([“LoopVectorization, TensorOperations”], sysimage_path=“sys_image.so”, precompile_execution_file=“my_precompile_recipe.jl”)
after I entered julia, the second step led to
ERROR: package(s) LoopVectorization, TensorOperations not in project
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] check_packages_in_project(ctx::Pkg.Types.Context, packages::Vector{String})
@ PackageCompiler ~/.julia/packages/PackageCompiler/wpsGv/src/PackageCompiler.jl:108
[3] create_sysimage(packages::Vector{String}; sysimage_path::String, project::String, precompile_execution_file::String, precompile_statements_file::Vector{String}, incremental::Bool, filter_stdlibs::Bool, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, include_transitive_dependencies::Bool, base_sysimage::Nothing, julia_init_c_file::Nothing, version::Nothing, soname::Nothing, compat_level::String, extra_precompiles::String)
@ PackageCompiler ~/.julia/packages/PackageCompiler/wpsGv/src/PackageCompiler.jl:445
[4] top-level scope
@ REPL[2]:1
it’s weird that using LoopVectorization, TensorOperations
is at the top of precompile_plots.jl
(I have commented out #BenchmarkTools
, and #@btime test7(A) setup=(n=30; A=rand(Float64,(n,n,n,n)))
- I profiled. Using
snakeviz
for profiling gives me many many information. I also usedtime.time()
to check my python code. The conclusion is, the series of permutations and additions of arrays is the bottleneck (besides some possible improvement). I used transpose innumpy
. But julia’s @tensor has some memory cache management, that outperform numpy by a factor of 2 ~ 3, Is there any way to optimize array additions and multiplications with transposes?, some of the discussions are summarized in the #6 reply in this question.
It would be great to start from julia. So far, the main python code has many other calculations. I think migrate into julia is bit complicated and prefer to improve the bottleneck by interfacing with julia.