Replacing CITATION.bib with a standard metadata format

I believe that the decision was to keep Pkg.jl limited to only building the environment (fetch, solve dependencies, version resolutions, install code). The metadata for package would be handled through something parallel.

2 Likes

This seems like a classic “perfect is the enemy of the good” situation: CodeMeta is the perfect here, CITATION.bib is the good. The CITATION.bib convention is truly the dumbest, simplest thing that could possibly work, but it’s extremely pragmatic. The CodeMeta thing is better, smarter, more general… and something that few people seem to know or want to do. To that end, I’d like to propose the following:

  • Those who don’t want to deal with CoteMeta should just write CITATION.bib files by hand;
  • Authors are encouraged to use CodeMeta as the source of truth and generate CITATION.bib files to check into packages.

It may be possible to write tooling to reverse generate CodeMeta data from the CITATION.bib format, which should be used as a one-time utility for those who want to transition, after which CodeMeta should be considered the source of truth. This way people using LaTeX for publication only ever need to look at CITATION.bib and don’t need to install any tooling to parse and format CodeMeta files just to cite something.

It would help to encourage the use of CodeMeta if someone could decipher the standard sufficiently to figure out how to write citations in packages. I have not been able to glean basic things like:

  • what the file name should be
  • what the relevant field names are
  • what tools one should install to parse and transform CodeMeta
  • what command invocations are necessary to generate BibTeX from CodeMeta

Without that basic information distilled somewhere, I don’t think CodeMeta has a snowball’s chance in hell of taking off in the Julia ecosystem and this thread merely serves to cast FUD on the already-fairly-successful CITATION.bib convention.

11 Likes

I’ve recently fallen into some familiarity with schema after searching high and low for various neuroscience related databases. If someone is truly motivated to become familiar with CodeMeta and translate it they can take a look here.

For what it’s worth, I’m starting to think CodeMeta is a bit of over engineering. Most people don’t care about all the stuff it’s trying to do and those who would care can get all that information elsewhere (e.g., Project files).

3 Likes

Just an FYI. I was led to https://github.com/adamslc/Citations.jl to put citations in docstrings. It needed some love so I did a very basic pull request that makes it so that it loads now. However, if this is something we intend to use to support bib files someone who knows more about text parsing and the documenter system should probably take a look at it.

1 Like
  • codemeta.json is the standard name (codemeta/codemeta#81)
  • the possible fields are listed here: The CodeMeta Project
  • it’s a JSON file.
  • Essentially doi2bib(JSON.parse(String(read("codemeta.json")))["citation"]) using the doi2bib function I posted above, at least in the minimal case of a single citation given by a DOI, although technically the citation field is a URL so you’d need split out the part after http://dx.doi.org/ first … you could also support other kinds of URLs as I mentioned above. (The citation field can also be an array.)

I agree that their documentation is abysmal, though.

7 Likes

I added an example to one of my packages: Econometrics.jl I used the Software Heritage tool to build it.

GitHub now uses CITATION.cff (since 5 days ago)
https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files#about-citation-files

Announced today https://twitter.com/natfriedman/status/1420122675813441540

9 Likes

Ideally we provide some barebones support for this and bib. It’s likely to have momentum with a big player like GitHub backing it, but it also is hard for me (and perhaps others) to see why GitHub is the entity that decides the best citation format.

Frankly, if the majority of software authors happened to converge on any reasonably sane format, I would not care much about who made the decision, whether it is the “best” format, or which one it actually is.

FWIW CFF seem OK, Github is a major player, and they invested a bit in the tooling… so hopefully everyone will follow their move and we can finally cite software conveniently (and be cited, which a lot of Julia package authors care about).

4 Likes

Zenodo followed suit:

5 Likes

I just think it’s odd that after billions of dollars in academic publishing fees Github is the one to decide on this. Certainly citation is important in other areas of work, but that’s where citations → income. I guess I anticipated developers being more bothered that Microsoft is making the calls here, but I don’t personally care all that much. Ultimately, I agree with what you said:

1 Like

In the format showdown between BibTeX code and YAML files I’m not sure which is worse but I agree that it’s good that someone just pick something and it doesn’t really matter who as long as there’s some kind of agreement. So CFF seems fine.

8 Likes

I am looking forward to the day when papers have a “software used” section (should not be mixed with the bibliography IMO), that one can just autogenerate from the Project.toml of the code written for the paper.

13 Likes

Huh? Students, grants, endowment and donations → income… citations don’t create any money whatsoever…

I assume it is taking the long view.
Get more citations, get promoted, get paid more etc etc

1 Like

Ah you mean more income for the academic. Generally, getting promoted / raises is based on things that bring in more income to the uni: grants. Highly cited papers might help build a case for promotion, but it’s not the main thing, and not clear if highly cited software is recognised in the same way

2 Likes

You mean like:

https://github.com/SebastianM-C/PkgCite.jl

9 Likes

Yes, exactly. Thanks, I did not know about that package!