Issues with Julia installation on google tpu vm

Hi,

I’m running into some issues after installing Julia 1.6.2 on a google TPU VM (just in case: a TPU VM is not the same as a TPU node). Would anyone happen to have run into this sort of issue before?

For instance, when I exit Julia, I get the following error:

julia> exit()
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at REPL[1]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fd3c4ba13ed)
unknown function (ip: 0x7fd3c4ba947b)
unknown function (ip: 0x7fd3c4ba96cb)
close_unit_1 at /workspace/srcdir/gcc-7.1.0/libgfortran/io/unit.c:778
close_units at /workspace/srcdir/gcc-7.1.0/libgfortran/io/unit.c:831
unknown function (ip: 0x7fd3c4f40f5a)
unknown function (ip: 0x7fd3c4b5aa26)
exit at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_exit at /buildworker/worker/package_linux64/build/src/jl_uv.c:633
exit at ./initdefs.jl:28 [inlined]
exit at ./initdefs.jl:29
jfptr_exit_29191.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:562
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:670
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:877
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:929
eval at ./boot.jl:360 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
#run_repl#42 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#874 at ./client.jl:387
jfptr_YY.874_23032.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_main_repl at ./client.jl:372
exec_options at ./client.jl:302
_start at ./client.jl:485
jfptr__start_34281.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:560
repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:702
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:51
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 2650 (Pool: 2639; Big: 11); GC: 0
Aborted (core dumped)

When I try to install packages like DataFrames, I get:

(@v1.6) pkg> add DataFrames
  Installing known registries into `~/.julia`
double free or corruption (out)

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fbcbf8683ed)
unknown function (ip: 0x7fbcbf87047b)
unknown function (ip: 0x7fbcbf87211f)
Curl_dedotdotify at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
parseurl at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
curl_url_set at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
Curl_connect at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
multi_runsingle at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
multi_socket at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
curl_multi_socket_action at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
curl_multi_socket_action at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibCURL/src/lC_curl_h.jl:230 [inlined]
curl_multi_socket_action at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/utils.jl:91 [inlined]
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/utils.jl:35 [inlined]
#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Multi.jl:147
lock at ./lock.jl:187
timer_callback at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Multi.jl:146
jfptr_timer_callback_18360.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jlcapi_timer_callback_18327.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
Curl_update_timer at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
curl_multi_add_handle at /home/ymh/julia-1.6.2/bin/../lib/julia/libcurl.so (unknown line)
curl_multi_add_handle at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibCURL/src/lC_curl_h.jl:194 [inlined]
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/utils.jl:35 [inlined]
#27 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Multi.jl:51
lock at ./lock.jl:187
add_handle at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Multi.jl:44 [inlined]
#9 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:345
with_handle at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/Curl.jl:64
#8 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:311 [inlined]
arg_write at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/ArgTools/src/ArgTools.jl:112
#7 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:310 [inlined]
arg_read at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/ArgTools/src/ArgTools.jl:61
jfptr_arg_read_19107.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#request#5 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:309
request##kw at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:293 [inlined]
#3 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:222
#open#317 at ./io.jl:330
jfptr_YY.openYY.317_31539.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
open##kw at ./io.jl:328
arg_write at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/ArgTools/src/ArgTools.jl:86
#download#2 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:221 [inlined]
download##kw at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Downloads.jl:221
jfptr_downloadYY.YY.kw_18584.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#download#12 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/PlatformEngines.jl:270
jfptr_YY.downloadYY.12_55315.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
download##kw at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/PlatformEngines.jl:247
pkg_server_registry_urls at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:994
#clone_default_registries#67 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:914
clone_default_registries at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:908 [inlined]
find_registered! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:1291
registry_resolve! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:814
#add#95 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:191
add##kw at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:155
jfptr_addYY.YY.kw_55781.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#add#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:80
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
add at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:78
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:408
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:386
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:550
jfptr_YY.24_52181.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
jfptr_run_interface_47967.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
#44 at ./task.jl:411
jfptr_YY.44_47312.clone_1 at /home/ymh/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 2649 (Pool: 2639; Big: 10); GC: 0
Aborted (core dumped)

I get these errors both with the standard way of installing Julia described on the installation page as well as with the python helper package jill.

Info about the virtual machine, in case helpful:

ymh@t1v-n-c8437338-w-0:~$ uname --a
Linux t1v-n-c8437338-w-0 5.4.0-1043-gcp #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I’m new to Linux and TPU VMs, so let me know if there’s info I should add that I haven’t already.

I just ran into what seems like the same issue. Have you found any solution? I posted in slack and zulip. A small thing I noticed while trying to check the glibc is that running the libx.so.6 gives a segfault, instead of displaying the usual glibc version info. Maybe google is using their own glibc?

$ /lib/x86_64-linux-gnu/libc.so.6 
Segmentation fault (core dumped)
1 Like

I basically just gave up on trying to use Julia on TPU VMs (even for non-TPU-involving tasks).

2 Likes

Did either of you try to build julia in this environment? Do you have a working curl inside the vm?

1 Like

Likewise, do you have a working git in the vm?

I’m running make right now. There’s all the usual default tools.

You may need to explicitly build libcurl and libgit2 from scratch. Consider the variables USE_BINARYBUILDER=0 or maybe USE_SYSTEM_CURL = 1.

https://github.com/conda-forge/julia-feedstock/blob/master/recipe/build.sh#L47

You can see the heavily hacked up conda-forge recipe for a lot of the options you can use.

1 Like

For example, perhaps you want to USE_BINARYBUILDER_LIBGIT2 = 0 if libgit2 is specifically having issues.

https://github.com/JuliaLang/julia/blob/1ad2396f05fa63a71e5842c814791cd7c7715100/deps/libgit2.mk#L2

1 Like

Failing that, I would highly recommend a Github Issue for this kind of technical issue.

The build failed. The errors seem to be the same (double free in this case):

...
Stdlibs: ────  40.157499 seconds 59.2907%
    JULIA usr/lib/julia/sys-o.a
double free or corruption (out)

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)

signal (11): Segmentation fault
in expression starting at none:0
ERROR: Failed to precompile __PackagePrecompilationStatementModule [top-level] to /tmp/jl_CVLTX0/compiled/v1.9/jl_w4THTL.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1547
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1491
 [4] top-level scope
   @ none:3
double free or corruption (out)

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f345d9123ed)
unknown function (ip: 0x7f345d91a47b)
unknown function (ip: 0x7f345d91c11f)
close_unit_1 at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:742
close_units at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:800

signal (11): Segmentation fault
in expression starting at none:0
ERROR: LoadError: failed process: Process(`/home/rs/tools/julia/usr/bin/julia -O0 --sysimage /home/rs/tools/julia/usr/lib/julia/sys.ji --trace-compile=/tmp/jl_CVLTX0/jl_dSBZAYY4Cb --startup-file=no -Cnative -e 'pushfirst!(DEPOT_PATH, "/tmp/jl_CVLTX0");
Base.PRECOMPILE_TRACE_COMPILE[] = "/tmp/jl_CVLTX0/jl_A37jNCchRY";
Base.compilecache(Base.PkgId("__PackagePrecompilationStatementModule"), "/tmp/jl_CVLTX0/__PackagePrecompilationStatementModule/src/__PackagePrecompilationStatementModule.jl")
# NOTE: these were moved to the end of Base.jl. TODO: move back here.
# # Used by Revise & its dependencies
# while true  # force inference
# delete!(push!(Set{Module}(), Base), Main)
# m = first(methods(+))
# delete!(push!(Set{Method}(), m), m)
# empty!(Set())
# push!(push!(Set{Union{GlobalRef,Symbol}}(), :two), GlobalRef(Base, :two))
# (setindex!(Dict{String,Base.PkgId}(), Base.PkgId(Base), "file.jl"))["file.jl"]
# (setindex!(Dict{Symbol,Vector{Int}}(), [1], :two))[:two]
# (setindex!(Dict{Base.PkgId,String}(), "file.jl", Base.PkgId(Base)))[Base.PkgId(Base)]
# (setindex!(Dict{Union{GlobalRef,Symbol}, Vector{Int}}(), [1], :two))[:two]
# (setindex!(IdDict{Type, Union{Missing, Vector{Tuple{LineNumberNode, Expr}}}}(), missing, Int))[Int]
# Dict{Symbol, Union{Nothing, Bool, Symbol}}(:one => false)[:one]
# Dict(Base => [:(1+1)])[Base]
# Dict(:one => [1])[:one]
# Dict("abc" => Set())["abc"]
# pushfirst!([], sum)
# get(Base.pkgorigins, Base.PkgId(Base), nothing)
# sort!([1,2,3])
# unique!([1,2,3])
# cumsum([1,2,3])
# append!(Int[], BitSet())
# isempty(BitSet())
# delete!(BitSet([1,2]), 3)
# deleteat!(Int32[1,2,3], [1,3])
# deleteat!(Any[1,2,3], [1,3])
# Core.svec(1, 2) == Core.svec(3, 4)
# # copy(Core.Compiler.retrieve_code_info(Core.Compiler.specialize_method(which(+, (Int, Int)), [Int, Int], Core.svec())))
# any(t->t[1].line > 1, [(LineNumberNode(2,:none),:(1+1))])
# break   # end force inference
# end
using Artifacts, Base.BinaryPlatforms, Libdl
artifacts_toml = abspath(joinpath(Sys.STDLIB, "Artifacts", "test", "Artifacts.toml"))
artifact_hash("HelloWorldC", artifacts_toml)
oldpwd = pwd(); cd(dirname(artifacts_toml))
macroexpand(Main, :(@artifact_str("HelloWorldC")))
cd(oldpwd)
artifacts = Artifacts.load_artifacts_toml(artifacts_toml)
platforms = [Artifacts.unpack_platform(e, "HelloWorldC", artifacts_toml) for e in artifacts["HelloWorldC"]]
best_platform = select_platform(Dict(p => triplet(p) for p in platforms))
dlopen("libjulia", RTLD_LAZY | RTLD_DEEPBIND)

'`, ProcessSignaled(6)) [0]

Stacktrace:
  [1] pipeline_error
    @ ./process.jl:542 [inlined]
  [2] run(::Cmd; wait::Bool)
    @ Base ./process.jl:457
  [3] run
    @ ./process.jl:455 [inlined]
  [4] (::Main.anonymous.var"#1#5"{Set{String}, String})(prec_path::String)
    @ Main.anonymous ~/tools/julia/contrib/generate_precompile.jl:271
  [5] mktempdir(fn::Main.anonymous.var"#1#5"{Set{String}, String}, parent::String; prefix::String)
    @ Base.Filesystem ./file.jl:764
  [6] mktempdir (repeats 2 times)
    @ ./file.jl:762 [inlined]
  [7] generate_precompile_statements()
    @ Main.anonymous ~/tools/julia/contrib/generate_precompile.jl:253
  [8] top-level scope
    @ ~/tools/julia/contrib/generate_precompile.jl:431
  [9] eval(m::Module, e::Any)
    @ Core ./boot.jl:368
 [10] top-level scope
    @ ~/tools/julia/contrib/generate_precompile.jl:6
in expression starting at /home/rs/tools/julia/contrib/generate_precompile.jl:3
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
make[1]: *** [sysimage.mk:88: /home/rs/tools/julia/usr/lib/julia/sys-o.a] Error 1
make: *** [Makefile:88: julia-sysimg-release] Error 2

Thanks for the advice @mkitti! I don’t see an obvious reason for the failure, but I’ll try the flags you recommend and open a github issue.

I posted an issue on github

2 Likes