Julia and Gitlab self-hosted : a state-of-the-art?

I have been working since last year with julia, and I developed several packages for internal use in my company. Up to now, these developments were my own and did not require much input from other developers. I am mostly versioning them using git repos on a NAS and created a private registry on that same NAS using LocalRegistry.jl (many thanks @GunnarFarneback ).

Since these developments tend to be used and developed by a growing amount of my colleagues, I would like to transfer the repos to a self-hosted Gitlab server, in order to benefit from all its functionalities (it did not seem worth the trouble at the beginning of my julian life).

I tried to find information on how to best take advantage of a Gitlab for Julia but either I do not have the correct keywords, or there is nothing much at the moment. The only thing that seems relevant and detailed is GitLab-examples / julia · GitLab

So if it was not already answered before, could people share their experience on the difficulties related to Gitlab CI / CD with Julia, e.g with :

  • Migration from a local storage
  • Registry setup
  • Test pipeline
  • Documentation pipeline
4 Likes

Invenia has like everything setup on gitlab for our internal stuff.
From autoscaling CI, to documentation hosting, to code coverage hosting.
Automatic tagging and releases when you merge.

I will poke our devs in the direction of this thread.
I know some parts are TagBotGitlab and LocalRegistry.jl

4 Likes

Recent thread:

1 Like

Thanks ! It seems that PkgTemplates defines a lot of stuff to help create new packages will all the configuration baked-in. However, I have the feeling that this requires configuration at a higher level than package (e.g. registry …), and that seems out of the scope of PkgTemplates, is it ?

1 Like

Hey, I’m a dev at Invenia (co-worker of @oxinabox), and this is an overview of our CI setup.

  • Test pipeline

Every Julia package we have lives in its own GitLab repo. Each has a .gitlab-ci.yml file which defines the pipeline for that repo.
The one you linked to is a good starting point .gitlab-ci.yml · master · GitLab-examples / julia · GitLab.
The basic idea for the test job is to use a julia docker image, clone the repo and run Pkg.test.
Like in that example, you can have one job per julia version you want to support.
If you have many packages like we have at invenia, you may want to have a central yml file somewhere (we have this live in its own separate repo) which defines a standard pipeline, and then have all package repos include that file.

We run tests on all packages every night to see if updated dependencies break our code (they get updated according to the semver we specify in our Project.toml files).
We do this with a scheduled pipeline on each repo.

Now to get the tests to actually run you need to set up some machines with the Gitlab Runner installed. We do this on a Mac mini which we have setup in our office, and also on AWS, where we have an AutoScalingGroup that scales the number of machines up and down based on CPU usage. We test on different architectures, and do that with the use of GitLab CI tags. Can provide more details if needed.

  • Documentation pipeline

Documentation is simply another job in the standard pipeline.
Actually, unlike the example you pointed to, we split it into two jobs.
“Documentation” builds the documentation (for every run of the pipeline, including for merge requests, so that we can review the doc changes before we approve and merge things), and “pages” publishes the docs (only on the default branch).
Here’s roughly what the .gitlab-ci.yml looks like for those jobs (untested, copy pasted and simplified stuff from different files, as our setup has complexified over the years):

"Documentation":
  artifacts:
    paths:
      - documentation/
  script:
    julia --project=docs/ -e "using Pkg; Pkg.instantiate()"
    julia --project=docs/ docs/make.jl
    # Move the rendered documentation to a folder called "documentation" in the root of
    # the repo which will be saved in an artifact.
    mkdir documentation
    mv docs/build/* documentation/

# Use the special job name "pages" to actually trigger deployment of the documentation on master.
# https://docs.gitlab.com/ee/user/project/pages/getting_started_part_four.html#job
pages:
  only:
    variables:
      # Only deploy docs on the default branch (eg master)
      - $CI_DEFAULT_BRANCH == $CI_COMMIT_REF_NAME
  dependencies:
    - Documentation
  artifacts:
    # As documentation is re-deployed every night the expiry just ensures that old documentation
    # is eventually cleaned up.
    expire_in: 1 week
    paths:
      - public/  # Note: Required to be called public for GitLab Pages
  script:
    - mkdir public
    - '[ -d documentation ] && mv documentation/* public/'
  • Registry setup

We have a PackageRegistry repo, which is a private equivalent of the Julia General registry, with just our private packages in it.
(Not sure how this was setup initially).
Users should install this registry on their local machine in addition to the general one (replace with your URL / name):

git clone <your_registry_repo> ~/.julia/registries/<company_name>

All julia package repos have a “Register” job which runs LocalRegistry.jl register on it: (again untested, copy pasted from different places)

“Register”:
  rules:
    # Attempt registration when Project.toml changes in the default branch
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      changes:
      - "Project.toml"
  script:
    # Set git config so the commits to the registry are from a given GitLab user
    - git config --global user.email "<email>"
    - git config --global user.name "<username>"
   # Register
    - julia -e 'using Pkg; Pkg.add(name="LocalRegistry", version="0.5.2")'
    - julia --project -e "using LocalRegistry; register(<local_package_path>, registry = \"<your_registry_repo>\", repo = \"$CI_PROJECT_URL\", create_gitlab_mr = true)"

(Special thanks to @gunnarfarneback for implementing the create_gitlab_mr functionality after we raised an issue!)

This will create a merge request in the PackageRegistry to add/update the julia package.
The PackageRegistry repo has its own .gitlab-ci.yml pipeline which runs the RegistryCI tests:

“​​Registry Tests":
  script:
    - julia --project=.ci/ -e 'using Pkg; Pkg.instantiate()'
    - julia --project=.ci/ --color=yes -e 'using RegistryCI; RegistryCI.test(registry_deps=["https://github.com/JuliaRegistries/General.git"])'

Once the tests pass, you can merge that MR, and from that point the new package/version will be available to users.
Note we automate the merge using GitHub - invenia/TagBotGitLab: Julia TagBot for GitLab which we run in an AWS Lambda (this also creates the GitLab git tags and release notes for each package). There might be simpler ways to automate the merge.

  • Migration from a local storage

We haven’t done this so can’t attest to it. But I think it is mostly about creating projects in Gitlab (1 project = 1 repo) and uploading & changing remote for your git repos (including the private registry).

So that’s the basics of it. A CI setup can be as complex as you need it to be (and ours has definitely become more complex over the years, to account for our many use cases). Happy to share more or answer specific questions. We’re thinking of putting out some blog posts on this topic, but it might take us a while to do that.

30 Likes

The setup at my company is less complex and correspondingly less ambitious in most, but not all, aspects.

  • Packages

Normally one per GitLab project (i.e. repository). Some projects contain dual Julia and Python packages, making use of the Pkg subdir functionality to point to the julia subdirectory. One project contains three closely related Julia packages.

If you have existing packages in git repositories elsewhere they are easily imported to GitLab. All git services have excellent instructions for how to import git repositories from anywhere else.

  • Testing

We run tests for all new commits (including merge requests and after merging to master) but no periodical tests of unchanged code. All tests are run in docker containers, either built from a Dockerfile in the repository or using a stock docker image from a common CI project. The typical test job is

test julia:
  stage: test
  script:
    - julia --project=julia -e 'using Pkg; Pkg.test()'

(This is an example from a dual package, so the Julia package is in the julia directory. The docker image is implicit from a default directive.)

  • Documentation

Most packages are documented by the README and additional markdown files in the repository. Some projects use an in-house documentation system which converts markdown files and various data sources to PDF files. This is just another CI job and the built PDFs are exported as CI artifacts.

  • Registry

The registry is just another repository. If you already have one on disk you can import to GitLab like any other repository. You should only need to update the repo field in the Package.toml for each package.
CI runs for all commits and basically consists of calling RegistryCI.test.

  • Registration

All julia packages use a common registration job, included from the common CI project, like this:

stages:
  - julia registration

include:
  project: $ci_project
  file:
    - templates/julia_package_registration.yml

variables:
  julia_package_dir: julia

The template is somewhat complex:

variables:
  Registry: https://project_71_bot:$JULIA_REGISTRATOR_TOKEN@$URL_TO_REGISTRY
  julia_package_dir: .
  package_repo_git_url: "git@$GITLAB_URL:$CI_PROJECT_PATH.git"

julia package registration:
  stage: julia registration
  only:
    - master
  script:
    - git config --global user.name $GITLAB_USER_NAME
    - git config --global user.email $GITLAB_USER_EMAIL
    - julia -e 'using Pkg; Pkg.add("LocalRegistry")'
    - cd $julia_package_dir
    - julia -e "using LocalRegistry; register(registry = \"$Registry\", repo = \"$package_repo_git_url\", create_gitlab_mr = true, ignore_reregistration = true)"

This uses an access token (the project_71_bot:$JULIA_REGISTRATOR_TOKEN stuff) to obtain credentials for the registry project and is run on every commit to master (but not on merge requests). If the version number has not been bumped, the LocalRegistry.register call won’t do anything. Merge requests for the registry are created by the create_gitlab_mr option and as soon as the registry CI job has passed they are automatically merged and immediately available.

  • Package Server

Packages are distributed using an in-house package server driven by the LocalPackageServer package. This means that the users have to set the environment variable JULIA_PKG_SERVER to point to this package server but they don’t have to add the company registry by URL, and they don’t need to have or bother with credentials for the GitLab server.

12 Likes

Thanks @ArnaudHenry and @GunnarFarneback for all the precious info.

So to sum up:

  • At the registry level

    • The registry can be created as for any other local registry using LocalRegistry.create_registry on the dedicated bare repo.
    • The registry should have a CI test job using RegistryCI.test in order to verify its integrity any time a package or a new version of a package is registered.
  • At the package level

    • Preferably, but not mandatory, julia packages should have their own separate repo.
    • A package should have several CI jobs
      1. A test job
      2. A registration job
      3. An optional documentation job

Some points are still unclear to me though your answers are already a lot:

  • @GunnarFarneback Is a package server required ? So far I’m just installing the registry on my pc and everything works fine.
  • @ArnaudHenry Are all the CI jobs done with a julia docker image ? How do you configure that part (sorry if it’s too much to ask)
  • @ArnaudHenry You developed TagBotGitlab with python, is it because GitHub - JuliaComputing/GitLab.jl is not developed ? Wouldn’t it be more straightforward to use that (if it were more advanced) ?

Anyway thanks again for all these tips, now I have to get into digesting this and test it on my setup !

1 Like

It’s a sound thing to do but to be honest I’ve never seen it fail. Then again, should I happen to introduce a bad enough bug in LocalRegistry, you may save time from detecting it early.

Not at all, it’s just a quality of life improvement if your organization struggles with GitLab credentials or have less technically inclined users who find adding a registry URL with Julia’s package manager to be an obscure thing to do.

Additionally it lets you cache packages and artifacts inside your internal network, which may or may not be an interesting feature.

1 Like

Assuming the registry to be public (inside my organization) could the package server bypass the Gitlab safeties, e.g. if a user does not have access to a package repo, could he install it through the package server ? That looks like some sort of a backdoor if it’s the case ^^.

If you have people inside your organization who shouldn’t be allowed read-only access to the Julia package code, you either shouldn’t have a package server or set up that as well with some kind of credentials for access.

If all people can be trusted with that access and your security model is fine with exposing code openly on the internal network, it’s rather a feature not having to deal with keys or tokens when you install Julia packages (whether it’s for users or in CI scripts).

1 Like

My solution was to make a JuliaPkgs group where every user has at least read access. All registered packages are in this group.

Thanks @ArnaudHenry for the detailed explanation and your work at Invenia. I’ve been using PkgTemplates.jl at my company recently and it has provent to be very useful.

Actually, I think this explanation - which is very good - could be almost published “as is” in julia Forem, don’t you think?

Providing with all relevant information in one place about how to implement a proper Julia CI/CD would surely help spreading its use in companies. I’ve found myself struggling a bit to gather all the relevant bits to do so.

BTW, thanks also @GunnarFarneback for LocalRegistry.jl, which was very usefull too.

4 Likes

Sorry for the late reply. Note I wasn’t there when all this was setup, so doing a bit of digging / learning myself :slight_smile:

Are all the CI jobs done with a julia docker image ? How do you configure that part

Yes we try to use docker jobs as much as possible.
Our docker jobs run on an image we build ourselves, based off amazonlinux:2 and where we install julia and other utilities like aws cli, and also git clone our private PackageRegistry into it.

You developed TagBotGitlab with python, is it because GitHub - JuliaComputing/GitLab.jl is not developed ? Wouldn’t it be more straightforward to use that (if it were more advanced) ?

I think TagBotGitlab was meant as a clone of GitHub - JuliaRegistries/TagBot: Creates tags, releases, and changelogs for your Julia packages when they're registered (written in Python), but for GitLab. We also used to have an internal web server running GitHub - JuliaRegistries/Registrator.jl: Julia package registration bot, before we switched to using LocalRegistry.jl instead.

I think TagBot and Registrator.jl are well-suited for the open source GitHub community and the General Julia registry, however probably a bit too complex for an internal setup (notably having to deploy them as separate lambda / web server).

GitLab.jl doesn’t seem maintained, however GitHub - JuliaWeb/GitForge.jl: Unified interface for interacting with Git forges would likely be a good alternative.

My recollection of the history there was that at some point (possibly still now) this ran inside a AWS Lambda.
Which at the time it was written didn’t support Julia.

Hi, do you have any comments on how to deal with possible Dependency Confusion attacks?

Until better tooling arrives in Pkg, my only advice is not to expose your internal package UUIDs to the external world.

If your registry is public, see Dependency confusion between internal registries and General · Issue #2393 · JuliaLang/Pkg.jl · GitHub.

2 Likes

@ArnaudHenry @GunnarFarneback I managed to used your great advice ! Each package has been moved to a separate repo on the Gitlab, as well as the registry. However, I am not sure what to do about the artifacts. They may be of different kind (some binaries that I do not compile myself, raw data …) and may be used by one or more packages.

At the moment I am thinking about two options :

  1. Create an artifact repo using Git-LFS

    • This would allow to add / modify artifacts using standard git procedures
    • I could lightly document the artifacts using the README.md
    • I am not sure though how the artifact hashes would work (cannot create the artifact in the repo directly)
  2. Put each artifact in the first package’s wiki that uses them

  • Easy, but very manual
  • Not centralized, if the package is dropped at some point the artifact is lost
  • No tracking of modifications (at least not in an obvious manner to me)

Would you have an advice how to best do that ?

I haven’t had a need for local artifacts so far, or at least not a big enough need to actually implement something, but I’m somewhat doubtful I would use GitLab for that at all. How to handle them would depend a lot on their nature; how big they are, where they come from, etc.

Do you mean that you do not use artifacts at all, or that you simply put them with the rest of your packages ?

At the moment, I’m working on a NAS, and this works just fine, except for the user rights which requires an admin to give access. With GitLab, I figured one could have more flexibility in that regard, but I may be mistaken.

Obviously we use artifacts with packages from the General registry but we haven’t needed artifacts for our own packages. For what it’s worth many of them were developed before artifacts even existed.