Good practices for package development in the Julia ecosystem

question

#1

Sorry if these questions are very basic, I am a social scientist with no formal exposure to software development techniques, especially on how to integrate to an ecosystem of packages, so my knowledge about this is pretty haphazard.

I have a few packages which are maturing and I plan to register them (possibly after we get Pkg3). This means that while I could just push commits to master previously since I was pretty much the only user, now I have to think about tagging versions, managing dependencies, etc, doing my best to make life for the users of these packages convenient.

I have read up on semantic versioning, but I wonder what the best practices are for getting from tagged version to tagged version. Should I just keep pushing into master, and tag releases occasionally? Or should I branch, eg into dev, and merge into master when I feel ready, then tag?

How to manage dependencies? Should packages depend on specific versions, or version ranges? If A.jl depends on B.jl and both are under my control, should I tag a new version of B.jl first, then make A.jl depend on it and tag that, too?

I am happy to do some reading if you can recommend good sources (preferably not long books, what I am looking for must be intro material). Writeup of how you do things would also be useful.


#2

This is up to you. Having a separate dev branch is called GitFlow.

http://datasift.github.io/gitflow/IntroducingGitFlow.html

This is normally used for deployment systems where master branch should always be working so that hotfix branches can be merged and put directly to deployment. I don’t think that matches many Julia package structures, so it can be unnecessary overhead. Packages which used to use it like Plots.jl no longer do so because it was unnecessary.

Instead, I prefer a modified version of GitFlow, thinking about tags as deployment and master as my working development branch. Here are some tenants I follow:

  1. You should keep master always working and master should always have passing tests.
  2. Small changes where tests pass, if it’s a lonely repo, can be pushed directly to master.
  3. Anything that breaks tests should be a PR. It’s funny at first to make PRs to yourself, but that makes them easy to merge in later and keeps things tidy rather than having just a branch (you will easily forget what it’s for if it waits a year :smile:)
  4. Tag early and often. Since master is always working, if it’s sitting around with some changes that are tested then you might as well release it. It’s easier on METADATA maintainers to check and merge a small diff frequently than it is to check large diffs infrequently (that’s my opinion at least, and I am one of these METADATA maintainers. You’ll see that if it’s a small diff and both tests pass then I’ll quickly review and merge those, but leave the larger ones because they take more work!).
  5. Try to keep users off of master. Always release after fixing a user-reported bug: it’s just a friendly way to keep the release version as your best face. Master is for package development only.

And just one point for me:

  1. Don’t spend too much time working on compatibility. This is controversial, but I find that people stay on old versions of Julia because they have an old code that they don’t want to change right now, and the worst thing you can do is start changing their packages. This has led to me showing people how to pin packages more often then not. So if it’s easy, keep compatibility, otherwise I wouldn’t worry too much about supporting v0.5 next year.

The package resolver is finicky. Package resolving in general is a very hard problem and you should avoid this when necessary. Set bounds when known, but don’t chuck them on there willy nilly because that’s the versions you’re using locally / on Travis. Only put upper bounds when you know there’s an incompatibility. Generally these will be set by other packages directly in METADATA.jl when they update and JuliaCIBot finds breaks, or when they know they will break downstream packages (you can PR to METADATA to add upper bounds to previous versions of packages which is when/how this is done). Lower bounds come from fixing problems that put an upper bound, or when you find incompatible versions because some user pinned an ancient ForwardDiff version and you want a better error.


#3

Thanks for writing this up, this seems very reasonable.

About dependencies: since CI (eg Travis) just gets the latest tagged versions unless indicated otherwise in REQUIRE, is it a reasonable starting point to work with unrestricted versions, and restrict only when

  1. CI fails, or
  2. a bug is reported that needs it?

#4

Yes, and (3) where someone else adds an upper bound via METADATA, you should probably propagate that back. There’s these kinds of upper bounds:

which are placed by others directly onto METADATA when versioning problems are found post-release.


#5

This sounds like a good strategy, but I’m not sure if I quite follow.
Are you submitting pull requests from a dev branch to your own master? Do you rebase / squash commits?
Or is it a pull-request for the tag, and then use that as an excuse to write some notes / summarise changes?


#6

It’s a pull-request to master from a “feature branch”. A feature branch is a branch which implements a single discrete feature. Other work can go on without it and merge into master just fine and the feature branch will need to be rebased every once in awhile to keep up. As a PR, it makes it pretty clear the current status of this feature for users and yourself, so I usually make a note of what’s wrong (i.e. why I didn’t just merge it to master).

Other PRs can be built off of that feature branch to chain a whole fix together. One example of when this is useful is when you’re waiting on a tag for another library. Then once it drops you can just merge all of those dependent PRs. Also, as a PR, you can check the CI tests individually before merging the features which is quite nice.

Never pull-request to a tag. Those should be immutable.


#7

Ah, OK. So your heuristic is:

  • Small, non-breaking, change --> Master
  • Consider tagging Master often with any notable updates
  • Any breaking change / new feature --> Feature branch; Pull-request into Master as and when it makes sense

#8

Yup, that’s a good summary.


#9

Amen. Compat.jl is great, but unless you really want to do bugfixes and feature releases for old Julia versions, it’s not worth the effort for most packages IMHO.


#10

Another, related question: semantic versioning relies on the concept of an API, but it is not clear to me what the convention is for that in Julia.

Is the API more or less the exported names and their semantics? That is to say, if I change the semantics of an unexported function/type/…, that does not merit a “just broke the API” version number increment?

Also, is it a reasonable strategy to triage features which will be part of the library and are otherwise well-tested, but have WIP semantics, by adding them to master but not exporting until the API for them stabilizes? (Base seems to do this sometimes.)


#11

You decide. I tend to export my API, but ForwardDiff.jl is a good example of a package that doesn’t. I would say, if it’s not in the documentation then it’s not in the API, that’s clear. There is a grey area though. Numbers are cheap so I wouldn’t worry about incrementing them.

Hell, I’ll even release with some undocumented stuff and call it not part of the API yet. There’s some integration algorithms with tests passing that I am not done with yet so it’s undocumented, but it can be useful for answers new users (and getting tested by them if they’re looking for it!). As long as it doesn’t effect precompilation (i.e. if it’s not breaking anything) I don’t see it as very harmful (unless it’s adding tons of compilation time or something like that). You can leave this stuff in a feature branch / PR though, that’s also sensible.

But that’s why it’s nice to always have master passing. If someone gives you a bug report, you can fix the bug and immediately release without hesitation. If there’s undocumented stuff in there, at least it’s passing tests so there’s not much to worry about, though I would make sure there’s an issue open for each thing that’s in need of docs.


#12

Thumbs up on that. Working on “feature”/“topic” branches and merging them back into master through pull requests is very good way to make sense of what yo have done. Think of the topic branch and subsequent PR as a group of commits focused on 1 feature or 1 bug fix. Then when you look back over history your individual commits are essentially labelled by the topic branch name and grouped and ordered by date.

Tagged releases also help with internal management in the same way and are, I think, a good way to leverage a versioning system that is independent from the programming language ecosystem. In all my dayjobs we merge to master from topic branch, and then “tag” the new release on master (tags can vary, but it’s best to do things like “0.1”, “0.2”, …). If you use github they have a very useable UI to do that.

Then, internally, different teams can be using different tag versions of master (0.1 or 0.2)… and can move to new versions on their own time.


#13

Well with Julia if you setup attobot then tagging a release also puts a METADATA PR out there, so it’s an easy way internally to keep track of things but also it’ll get your code updated in the package ecosystem.