Efficiency for calling Julia from python and purely run Julia

Thank you so much.

julia --trace-compile=my_precompile_recipe.jl precompile_plots.jl

works. Then,

using PackageCompiler
create_sysimage([“LoopVectorization, TensorOperations”], sysimage_path=“sys_image.so”, precompile_execution_file=“my_precompile_recipe.jl”)

after I entered julia, the second step led to

ERROR: package(s) LoopVectorization, TensorOperations not in project
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] check_packages_in_project(ctx::Pkg.Types.Context, packages::Vector{String})
   @ PackageCompiler ~/.julia/packages/PackageCompiler/wpsGv/src/PackageCompiler.jl:108
 [3] create_sysimage(packages::Vector{String}; sysimage_path::String, project::String, precompile_execution_file::String, precompile_statements_file::Vector{String}, incremental::Bool, filter_stdlibs::Bool, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, include_transitive_dependencies::Bool, base_sysimage::Nothing, julia_init_c_file::Nothing, version::Nothing, soname::Nothing, compat_level::String, extra_precompiles::String)
   @ PackageCompiler ~/.julia/packages/PackageCompiler/wpsGv/src/PackageCompiler.jl:445
 [4] top-level scope
   @ REPL[2]:1

it’s weird that using LoopVectorization, TensorOperations is at the top of precompile_plots.jl :frowning: (I have commented out #BenchmarkTools, and #@btime test7(A) setup=(n=30; A=rand(Float64,(n,n,n,n)))

  1. I profiled. Using snakeviz for profiling gives me many many information. I also used time.time() to check my python code. The conclusion is, the series of permutations and additions of arrays is the bottleneck (besides some possible improvement). I used transpose in numpy. But julia’s @tensor has some memory cache management, that outperform numpy by a factor of 2 ~ 3, Is there any way to optimize array additions and multiplications with transposes?, some of the discussions are summarized in the #6 reply in this question.

It would be great to start from julia. So far, the main python code has many other calculations. I think migrate into julia is bit complicated and prefer to improve the bottleneck by interfacing with julia.