Julia in Colab

Hmm…

using Pkg
Pkg.add("CUDA")
Pkg.add("BenchmarkTools")
using Plots, CUDA, BenchmarkTools

works just fine for me (well aside from the error shown below, but running the usings again works). Additionally, CUDA.versioninfo() works just fine and returns

CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 550.54.15

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.54.15

Julia packages: 
- CUDA: 5.6.1
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.10.8
- LLVM: 15.0.7

1 device:
  0: Tesla T4 (sm_75, 14.738 GiB / 15.000 GiB available)

Can you provide more details (e.g. the code) on exactly what you’re doing?

Error seen when calling using CUDA on fresh T4 GPU runtime:

InitError: could not load library "/root/.julia/artifacts/ac4708c3ef40405014c1080c17818cfa7d017563/lib/libGL.so"
/root/.julia/artifacts/ac4708c3ef40405014c1080c17818cfa7d017563/lib/libGL.so: undefined symbol: _glapi_tls_Current
during initialization of module Libglvnd_jll


Stacktrace:

  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)

    @ Base.Libc.Libdl ./libdl.jl:117

  [2] dlopen(s::String, flags::UInt32)

    @ Base.Libc.Libdl ./libdl.jl:116

  [3] macro expansion

    @ ~/.julia/packages/JLLWrappers/GfYNv/src/products/library_generators.jl:63 [inlined]

  [4] __init__()

    @ Libglvnd_jll ~/.julia/packages/Libglvnd_jll/rKoF9/src/wrappers/x86_64-linux-gnu.jl:22

  [5] run_module_init(mod::Module, i::Int64)

    @ Base ./loading.jl:1193

  [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)

    @ Base ./loading.jl:1181

  [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})

    @ Base ./loading.jl:1126

  [8] _tryrequire_from_serialized(modkey::Base.PkgId, path::String, ocachepath::String, sourcepath::String, depmods::Vector{Any})

    @ Base ./loading.jl:1551

  [9] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)

    @ Base ./loading.jl:1644

 [10] _require(pkg::Base.PkgId, env::String)

    @ Base ./loading.jl:2008

 [11] __require_prelocked(uuidkey::Base.PkgId, env::String)

    @ Base ./loading.jl:1882

 [12] #invoke_in_world#3

    @ ./essentials.jl:926 [inlined]

 [13] invoke_in_world

    @ ./essentials.jl:923 [inlined]

 [14] _require_prelocked(uuidkey::Base.PkgId, env::String)

    @ Base ./loading.jl:1873

 [15] macro expansion

    @ ./loading.jl:1860 [inlined]

 [16] macro expansion

    @ ./lock.jl:267 [inlined]

 [17] __require(into::Module, mod::Symbol)

    @ Base ./loading.jl:1823

 [18] #invoke_in_world#3

    @ ./essentials.jl:926 [inlined]

 [19] invoke_in_world

    @ ./essentials.jl:923 [inlined]

 [20] require(into::Module, mod::Symbol)

    @ Base ./loading.jl:1816

another reason why Interact.jl should be revived :slight_smile:

5 Likes

@metrizable I noticed that Collab comes with a pre-populated default environment. We discourage that for many reasons, partly because it leads to unfortunate interactions when users want to install their own packages.

Admin: How to provide Julia to users? has some more information on that topic (HPC admins also have the initial desire to provide a pre-populated default environment)

I noticed that precompilation times are quite bad, the machines seem to be beefy enough so I wonder if the slow down is some unfortunate interaction with the file system?

Ideally one would be able to have a transparent cache for JULIA_DEPOT_PATH. (~/.julia) (That’s something we do on GitHub actions), and support for Project.toml?

11 Likes

I very much understand where you’re coming from (and thanks for the link to juliahpc; I didn’t know about that), but also think it really depends on use-case. For example, I teach intro physics and have been trying to think of streamlined ways to get Julia into my students hands so I don’t have to say

Okay go download juliaup, then install pluto, then install these packages oh and also this is what an environment is…

So when I heard this news that Colab supports Julia, I was elated because it’s really exactly what I wanted: a preconfigured IDE* (not to mention notebook front-ends are really nice and intuitive for people just starting out), and common packages that can get you really far e.g. Plots, all without downloading or installing anything. So now I can just say

Go to this website. Write using Plots; dom = 0:pi/4:2pi; plot(dom, sin.(dom)). Press Shift+Enter.

and a plot shows up. Essentially making the time-to-first-plot zero.

All to say, I think Colab, while yes it can be used for almost anything, really shines when used in education so e.g. including these packages automatically is such a game changer for the adoption of Julia with new computational scientists.

13 Likes

This is fantastic news, thank you very much !
Any plans to support the latest stable version of Julia ? (Currently 1.11.3).

1 Like

I mostly agree with you, but my experience over the years has shown that pre-installed packages are more harm than good.

I would love to point a student towards a Project.toml + Notebook and have then run on a prepared installation.

But that is different from the packages being in the default environment, over time they become stale and as soon as a Student wants to install a new package they will encounter issues.

This is also the reason why I prefer Pluto over Jupyter these days. The built in dependency management gives me a fighting chance for the code to just work for students.

My stance on this is that the default environment ought to be empty, notebooks should have their own environments, and the Julia depot ought to be cached. I strongly believe that to be possible and that this gives everyone including students the best possible experience.

19 Likes

@vchuravy I totally understand the motivation, but colab is simply meant for different types of workflows.

Note that this is identical behaviour to Python, and has been for many years now. The default Python environment has a wide variety of packages, which you can see with pip list -v:

Of course, Python also has pyproject.toml for controlling environments. This choice for a default environment was made for Python, and has worked really well for the usecases people use colab for. I don’t think Julia has a specific technical reason that would necessitate different behavior.

I also don’t think having an empty base environment is good for the types of workflows people use colab for, like @NonDairyNeutrino’s use case. (And note you can always include some cells that remove packages from the default environment/install new ones)

9 Likes

my experience over the years has shown that pre-installed packages are more harm than good

That is also my experience. Julia + Quarto is another example where extra env settings on the Quarto side made things harder for Makie.jl users.

2 Likes

I feel that’s more an artifact of Python package managers being a multitude and there not being a standard. Every collab notebook I look at from Python people seems to have some convoluted way of installing additional packages.

There seems no integration between Collab and pyproject.toml? poetry-and-colab/Using_python_poetry_in_Google_Colab.ipynb at main · elise-chin/poetry-and-colab · GitHub

For me, the question is: How can I ensure that Collab notebooks are reproducible? What happens when the base environment is changed underneath me?

In particular, for students, I want to avoid the frustration of things breaking that are implicit. For me, the switch to Pluto was liberating, since suddenly, I started to receive notebooks from students that mostly worked and that kept working.

8 Likes

I gave this a try today in Google Colab but I am struggling with the authentication step. Is there a way to turn on debugging to see if colab receives my message?

This is so excellent. Thank you for doing this!

Saw that Makie was preinstalled which is cool, but then, only Makie is preinstalled :slight_smile: So no backend, probably should add CairoMakie or GLMakie in that case then

6 Likes

This will definitely boost the language’s popularity! Really good news!

2 Likes

As a regular colab user, I don’t agree with this sentiment—it’s about the workflow, not the package ecosystem. Keep in mind that Colab started as an internal dev tool at Google, which technically does have a single standard “package manager.” So it doesn’t really apply.

Moreover, while Python’s open-source package managers aren’t perfect, I view that as an orthogonal discussion to having pre-built environments.

Indeed. Despite Python having an equivalent to Project.toml, and the ability to permit per-project environments, there is still a pre-built environment.

Thus, I think any argument for Julia having an empty base environment would also make the case for Python having an empty base environment. And that will never happen (users would riot), because it’s simply not how people use colab. Colab is not a replacement for GCP, it’s meant for quickly spinning up a Jupyter notebook with absolutely everything ready to go, and easy GPU access. (In some ways, devs like yourself might not even be the target audience, if this is not what you want)

For reproducibility, note that the version of Julia/Python is out of your control; there’s a single global version set by the Google team. So 100% reproducibility for perpetuity is impossible. But aside from that, it’s probably best to just write code that purges the environment and builds it up with specific package versions.

Again, though, that’s not a normal use case of colab. It’s fine to do that, but I don’t think this is what a standard user (of colab) wants.

2 Likes

I am less concerned about myself, I am more concerned about my students and people’s first experience with Julia. I don’t find it helpful to “refer to the standard user of Colab”, but it is totally fair to talk about your expectations for Colab.

To concretize things, this is a notebook I have been playing around with Google Colab that I would love my students to take and explore.

Currently, it is a pain to use since it requires installing CUDA and some of the DiffEq stack which is painfully slow on Colab. So how do we fix that?

I think it is unreasonable for the Colab team to go make the decision “this is a standard package of Julia” or install a plethora of Julia packages.
The Julia package manager is rather easy to access from Colab (in contrast to Python). Just because something “works” for Python does not mean it is what we should encourage in Julia.

I would rather have a robust story around caching, and maybe figure out why precompilation is so much slower on Colab than it is natively. I don’t care as much about the “default environment” since I can opt-out easily, but I wouldn’t want “just put more stuff in the default environment” to be an answer to a real usability problem.

4 Likes

Maybe we can find an argument where Julia and Python are treated separately. We should not assume that Python is an upper bound, or a hard constraint on the Julia experience in Colab.

I agree with this statement. It is very limiting from the perspective of a Julia user, and very ad-hoc from the perspective of a software maintainer. If Colab had a narrower scope like “Machine Learning in the Cloud”, then it would make more sense to pre-install a set of packages. Even in that case, it is a difficult decision.

+1

Again, I just don’t see any arguments for why Julia should have a different Colab experience than Python. So far all statements would also apply to Python, but Python has a prebuilt env. It’s just how people use it. There’s plenty of other tools out there; Colab is one with prebuilt environments. That’s part of the user experience!

I’m afraid I don’t follow. Here’s julia:

using Pkg; Pkg.add("MyPkg")

and here’s Python:

%pip install mypkg

It’s the normal way to use colab notebooks.

I think it’s totally fair to make a case for an empty base environment, but I think you’d also be making a case for an empty Python base env. And clearly the user base prefers a predefined base env. It’s just how people use it.

3 Likes

@MilesCranmer I feel like we are talking past each other. I am saying two distrinct things:

  1. In my experience of working with students, environments that have a populated default environment, have caused confusion and support issues.
  2. Currently, the experience of installing packages in Colab for Julia is subpar. I spent close to 10 minutes in one notebook, installing necessary dependency.

We can disagree on 1 and be of different opinion and that is fine, but the crux for me is 2. because that leads to the question: Can I use Colab for teaching? Can I recommend Colab to beginners of Julia?

And clearly the user base prefers a predefined base env. It’s just how people use it.

Is that clear? Isn’t it more that Colab does not provide a different option, and thus people use it in the only way possible?

And shouldn’t we ask ourselves: What is a better user-experience?

By the way, in Colab you can use:

] add MyPkg

directly talking Julia’s package manager.

7 Likes

You are not synchronizing the kernels, hence I believe you measure only the kernel launch. You need to CUDA.@sync the CUDA call.

using Pkg
Pkg.add("CUDA")
Pkg.add("BenchmarkTools")
using Plots, CUDA, BenchmarkTools

pMax = 8
powerVector = 1:pMax
timeVectorCPU = Vector{Float16}(undef, pMax)
timeVectorGPU = Vector{Float16}(undef, pMax)

for p in powerVector
    n = 10^p
    xCPU, yCPU = (ones(n), ones(n))
    xGPU, yGPU = (cu(xCPU), cu(yCPU))

    timeVectorCPU[p] = @belapsed $xCPU + $yCPU
    timeVectorGPU[p] = @belapsed CUDA.@sync $xGPU + $yGPU
end

timeVectorCPU |> display
timeVectorGPU |> display

plot(
    10 .^ powerVector, 
    [timeVectorCPU timeVectorGPU], 
    label = ["CPU" "GPU"], 
    title = "CPU vs GPU",
    xscale = :log10,
    yscale = :log10,
    ylabel = "Elapsed time [s]",
    fmt = :png
)

7 Likes

Wouldn’t an easy solution be that the default env in a colab notebook is still prepopulated for those who like that, but it isn’t the base env but just some local one. Then any user could issue a Pkg.activate("somethingelse") and be separate from that. If packages are not installed in the base env, they will not be accessible anymore at that point via stacked load path, which I think is the main hurdle for reproducibility people face.

2 Likes