Newcomer contributor in JuliaGeo and co. - Help me get started!

Thanks everyone for your swift and detailed replies, happy to be here!

Before I reply to each one individually I want to justify putting packages into an organization. From my perspective there are numerous advantages in having the repos in the organization:

  1. Inspires more trust.
  2. Increases the pool of people that are likely to review a PR.
  3. Newcomers have a specific collection to search for packages.
  4. Invites more contributions by non-members (why? because the repo is detached from a single person’s name).
  5. It is much easier to find! Because once you find one package from JuliaGeo, you immediately see that it is from JuliaGeo, and thus you click and you go to the org’s page.

I urge you all to consider putting your packages in an organization (probably JuliaGeo, since JuliaAtmosOcean is actually just 1 package which could join JuliaGeo). At the moment there is a lot of scattered material, e.g. GeoStats, ClimateTools, ClimateMaps, NCDatasets all have different owners (and in fact, I’ve only became aware of GeoStats and ClimateMaps after your posts, which would not be the case if they were part of JuliaGeo).

Transferring is quite trivial thanks to GitHub, this is how I’ve done it for people joining JuliaDynamics or JuliaMusic:

  1. Invite the owner of the repo to the org (and all important contributors).
  2. The owner transfers the repo to the org via the settings.
  3. Create a Team on the org with owner level access to the transferred repo. Make original owner (and all important contributors) part of this team.

The last step ensures that the person that initially owned the repo still maintains all privilliges while not getting full privileges over the entire org. JuliaClimate is an alternative if JuliaGeo is not fitting, but one should be transparent about their scopes then.


@juliohm, GeoStats.jl will certainly be useful, so I’ll add it to the list of “packages to study”. For the beginner issues you cite, probably the missing values and the Unitful.jl are the ones I could tackle at the moment. Could you please describe them in more detail on the GitHub issue page, so that they are more approachable from a beginner’s view? I’ll also keep an eye on the unified interface for data.


@Balinus ,

Cool, I can help with that for sure. This is what we did with Agents.jl lately when we ported it to JuliaDynamics. I think after porting to version 2.0 we have effectively reduced the complexity of the package by half (which is amazing!!!).

I couldn’t agree more. Separating plotting functionality from actually scientific computing is extremely important. Not only it speeds up everything, it reduces file sizes but it also makes it much easier to run stuff on the cluster. Perhaps you can open up an issue outlying in detail what should be done, and I have a look? (Btw we also separated plotting from Agents.jl when we ported it to JuliaDynamics)

I also agree fully with that and it (massively) helps newcomers. As Stefan Karpinski once said “there should be one package that does that one thing, and it should be the best package”. (Same goes for the 2 NC reader packages imho).


@visr thanks for pointing to this comment, so I can see some different scopes there. What you propose is definitely useful, i.e. reducing code by adding inter-dependencies. But it should also be made clearer in the docs/readmes which package to use for what reason (i.e. what are their actual target goals).

Although you have a point, I am not so sure I would be as concerned as you about this. In the end of the day, if there is enough material so that a new organization is necessary, one can just transfer the repos with 2 clicks. At the moment maybe it is worth considering JuliaClimate as an org, but of course I am a newcomer and i shouldn’t be the one judging that.

In my eyes an organization is more about thematic connection between packages, and ease of finding them, and not so much for functionality connection.


@joa-quim thank you for the suggestion of GMT.jl, seems that it is useful to easily plot an ncdataset. But on the other hand yet one more way to load netCDF data just makes things more complicated for me. I’ll stick with NCDatasets.jl for now.

@fabiangans Cool, thanks! Thankfully I will be working with a small dataset (~1gb) for the start of my project, but as I move along I’ll keep this in mind!

I also like your approach of how one “maps” their analysis onto the datasets. It will certainly come in handy for me.

Please do, the number is 408!

1 Like

Off-topic comment

From a user’s perspective I agree with that, but from a package owner perspective it feels like I loose my package upon transfer. Sure if one digs into the contribution stats, then it’s easy to find out, who’s package it really is. But my visibility is somewhat lost.

It would be nice if somehow the two opposing interests could be more aligned within GitHub’s setup.

1 Like

I see your point, but this argument goes the other way as well: If I am a user and I want to contribute to someone else’s package then “my effort will be hidden”, as they will take “all the credit” being the “one name”. That’s also unfair, isn’t it? A good solution would be to have a list of “main developers” on the upper part of the README maybe?

In general, I think these things just happen when the packages are ready. In the early days of Julia, a lot of repos ended up in organizations, and now a lot of organizations have a bunch of effectively abandoned packages.

I am not sure I understand the reasoning here. From my perspective, clean code, good documentation, and responsive maintainers invite contributions. These are pretty much orthogonal to repos being in an organization.

If anything, I find it easier to contribute to package which has a single main contributor who has a vision about where the package should be going. A lot of PRs just sit around for a long time because no one is willing to make some major decisions (which include just saying no to a PR, but quickly).

2 Likes

Thank you @Datseris, I will add more detail to the issues on GitHub, and will ping you there if that is ok.

Regarding the move to the organization, I think it makes sense to move a package to an org when there is a shared vision for a project. Right now GeoStats.jl is pretty much my own vision for what spatial statistics should look like. In the future, if I perceive that other people with similar background are joining the effort and are contributing to this vision in non-trivial ways, then it makes sense to start an org. I think I agree with what @Tamas_Papp said: it is better to have a single owner that is fast making major decisions with a clear vision in mind, than a community of people touching the code freely. We have examples of projects in the spatial statistics literature that were touched by multiple people and that became a spaghetti code. I’d like to avoid this.

2 Likes

Discoverability is indeed important for the whole ecosystem (larger than one org) to thrive, and avoid unneccesary duplicate efforts. But having a large list of packages in various states under one org may not be the best either. We know that not everybody wants to transfer it in, for a variety of reasons. To help discoverability we can expand on https://juliageo.org/, which lists packages in the whole ecosystem, not just JuliaGeo. And of course tagging packages can help in finding them on https://pkg.julialang.org/docs/.

4 Likes

Sure, this seems like a good middle-ground solution.

1 Like

@Datseris

Here’s some infos about what needs to be done to separate the mapping/plotting functionalities into a new package.

Cheers!

1 Like

seems that it is useful to easily plot an ncdataset. But on the other hand yet one more way to load netCDF data just makes things more complicated for me.

@Datseris It looks I passed a slightly misleading message. GMT stands for Generic Mapping Tools and is known mostly for its mapping quality, but the data used in it can come from whatever origin the user wants. I provided that example to show an easy way of loading data from a nc file, but the way data is obtained is irrelevant. Right, it need to be formatted into a GMTgrid structure and here is where the ongoing effort to create and grid abstraction model can be useful. I’ll keep an eye on it and try to integrate it in the GMT.jl workflow.

But GMT has been developed for 30 years and is much more then just mapping. For example, you mentioned somewhere above the interest in a regriding utility. In GMT.jl one can downsample the example grid to 0.2 degrees (it was ~0.08) with

Gresamp = grdresample(G, increment=0.2);

But although this would be good for mapping, it’s not the correct thing to do because the downsampling introduced aliasing. So the best is to filter the grid to avoid aliasing. That would be done with the grdfilter module

Gresamp = grdfilter(G, increment=0.2, filter="g20", distflag=4);

The above does a gaussian filter with a filter width of 20 km, where in each node the 20 km are calculated knowing that we are on the sphere.

Hope this provides a better idea of what the GMT.jl package is all about.

1 Like

@Datseris you may also want to look at GeoData.jl if you are working with raster datasets in models. It generalises load/save and indexing for quite a few geospatial file types, including working with large multi-file datasets. It also does lat/long/time etc arbitrary dim order indexing using DimensionalData.jl. And also plotting.

It was really made for modelling - to avoid hard coding your models to specific formats and file storage structures, especially for larger than memory datasets. That’s what I use it for the most. But its also great to have easily plottable model inputs and outputs as spatial data will propagate through most Base/Statistics methods you apply to the array.

It should be released in the next month or two, but a lot of my modelling packages already use it so it’s relatively stable.

1 Like

Seems that github could help with these issues [ie dilution of credit for authorship], by maintaining a provenance for packages as they move from private ownership into organizations and through subsequent reorganizations, and keeping a CV on the personal github accounts detailing contributions to and participation in organizations. Maybe they already do some of this? Documented package history and authorship should also help with the inevitable malware and typo-squatting attacks.

Let’s try to keep the conversation on topic please. I merely suggested that there could be benefits for having repos in orgs, but talking about the complications of it is honestly unrelated with the topic, which is: a newcomer in meteorology-related work would like to contribute to Julia packages. Anything outside this should be discussed in a separate topic.

1 Like

Thanks @joa-quim, now I see that GMT is something useful. I am trying to collect all the packages that can plot a spatial field over the earth, and so far I have GMT and ClimateTools (which will be ClimateMaps in the future). Is there something else?

Not that I know. Of course if you are dealing with a small part of the earth and don’t need to take the earth’s shape into account you can use standard Plots.jl / Makie / etcetera.

Would be nice to eventually have pure Julia support for this, for instance in GitHub - MakieOrg/GeoMakie.jl: Geographical plotting utilities for Makie.jl.

@NHDaly just shared this on Julia Slack: https://github.com/NHDaly/Maps.jl which I guess counts as one more!

1 Like

Yeah, Maps.jl was sort of a half-day sprint over the holidays to see if i could quickly get a cool mapping package up, built on Makie. I ended up spending a good deal of time just learning Makie in the process.

If it makes more sense to merge that bit of work into GeoMakie, i’m 100% down with that!

2 Likes

Because in this post there was some discussion regarding organizations and moving packages to organizations, I want to cross link the following two posts: 1 and 2 from a similar discussion in a different thread. They “outline” in some sense the story of JuliaDynamics, which some users believe to be a successful case of an organization around a scientific discipline.

I don’t want to derail this topic here to a discussion about GitHub orgs and moving packages. But I do think that such a discussion would be helpful, as there are numerous benefits in it, and it has started here. Maybe someone is willing to open a new Thread under Domains/Geo ? As I currently own 0 packages related with Climate and Geo, I don’t think I can open it.

1 Like

FYI, GMT.jl already belongs to an organization, just not a Julian one.

(Copying some of my responses in this thread over here.)

The process for getting involved in JuliaGeo development is described here: Pages · JuliaGeo/meta Wiki · GitHub.

The JuliaGeo organization is loosely-tied (in name and focus) to https://www.osgeo.org/. I agree with Martijn’s comment: membership in an organization/team are a way of managing permission settings for packages that you co-develop with people, and are best used for that purpose. I’m sorry about the ambiguity in the name, here is the thread where the discussion for it happened: https://groups.google.com/d/msg/julia-geo/oPz2_gz9K-g/TbULPveFXdAJ.

For packages that are oriented along scientific domains, sometimes it might be worthwhile to have github organizations (that might be cross-language efforts) around it. If there is a collection of packages that works well together for oceanography (that doesn’t otherwise depend on JuliaGeo), it might make sense to create a separate github organization for it: that might give you more freedom to take it in a direction that facilitates development for those packages you are interested in.

2 Likes

FYI, Gael Forget has started JuliaOcean, an organization to host packages around ocean stuff. It’s pretty empty for now but hopefully it will fill up and help new Julia users discover ocean-related packages and serves as a community of sorts :slight_smile: