Replacing CITATION.bib with a standard metadata format

Many Julia packages are starting to use a CITATION.bib file containing bibTeX citations for the package, as suggested on the Julia Research page.

I think it is a mistake to use a Julia-specific format here — providing code metadata is not a Julia-specific problem, and there is a huge community interested in promoting better metadata formats. BibTeX is very convenient for those of us in mathematical sciences where LaTeX is popular, but it is pretty inconvenient outside of a TeX environment (e.g. its use of TeX-specific escapes for diacritical marks rather than Unicode, among other quirks).

It seems like there are two major (i.e. reasonably popular) proposals in this area:

Of the two formats, my sense is that many people are coalescing around CodeMeta. I see a lot of tools popping up around codemeta for (e.g.) R and Python, increasing adoption of people putting a codemeta.json file in the root of their github repositories, working groups like RDA and Force11 focusing on CodeMeta, etcetera. On the other hand, CFF is closer to BibTeX in that it is specifically intended to represent citations, and there are tools to convert CFF to BibTeX.

I’m honestly not thrilled about either of them, but it seems crazy to have a Julia-specific format here. There are also plenty of tools to convert a DOI into BibTeX, so a CodeMeta file with a DOI URL for the citation field would be pretty easy to automatically convert to BibTeX, and I’m hopeful that we’d have a Julia package-to-BibTeX tool relatively quickly.

16 Likes

I am a bit confused, the page talks about BibTeX, but the actual format seems to be a mapping of BibTeX to YAML. Indeed it is a bit unusual.

That’s just for the www.julialang.org)/_publications site generation. The CITATIONS.bib file that people are starting to put in their package repositories (e.g. here) is indeed just BibTeX.

It was easier than I thought. Here is a Julia function to convert a DOI into BibTeX:

using HTTP

doi2bib(doi::AbstractString) =
    String(HTTP.get("http://data.crossref.org/$doi",
                    ["Accept" => "application/x-bibtex"]).body)

For example,

julia> print(doi2bib("10.5334/jors.151"))

@article{Rackauckas_2017,
	doi = {10.5334/jors.151},
	url = {https://doi.org/10.5334%2Fjors.151},
	year = 2017,
	month = {may},
	publisher = {Ubiquity Press, Ltd.},
	volume = {5},
	author = {Christopher Rackauckas and Qing Nie},
	title = {{DifferentialEquations}.jl {\textendash} A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia},
	journal = {Journal of Open Research Software}
}

Note that there are a variety of ways to get a DOI for software besides a traditional publication, e.g. via Zenodo, and it is becoming increasingly popular.

Note also that, since the citation field of CodeMeta is just a URL, we could easily support various common citation URLs in addition to DOIs:

  1. for doi.org URLs (DOIs), use doi2bib
  2. for github.com URLs, generate a @misc bibitem citing the URL, perhaps using the github API to fetch top contributors as authors.
  3. for arxiv.org URLs, use the arXiv API
12 Likes

All discussions about the CITATION.bib file were stuck at the question “why not just using other existing formats?”. The bibtex file is as easy as a copy-paste, but there are several quick ways to retrieve a bibtex for the given paper and we could create also some julia tools to do that for the codemeta file in a package directory

2 Likes

So I have been working on a package for managing databases of LaTeX notes and references

https://github.com/chakravala/VerTeX.jl

It works by writing a LaTeX document in an editor, and then when you are finished Julia will parse the document into a bunch of YAML files and store them in a decentralized database. When you want to recall a document the system can generated the LaTeX file from the database. Since it parses the info into smaller chunks, it works like a relational database and different permutations of documents can easily be generated by using some graph theory concepts.

It relies on some code from Pkg to have a nice REPL and hope to build the registry feature into it also.

It is also designed to work with Julia package repositories, so that you can have a vtx folder in there to manage a local VerTeX database in a Julia package repository.

Although it is a Julia package, it is designed to be a general VerTeX format implementable in other langs.

I haven’t put much more work into it recently, because I wanted to focus on studying math… but I definitely want to do some more work on it… although developing so many cool packages for free is a lot of work.

Bibtex not implemented in it yet, but that’s definitely a feature that will be supported eventually.

Like I said, doing all these different packages for free is not sustainable always.

1 Like

Writing a tool to generate a BibTeX citation from citeproc.json with a citation URL is a lot simpler than writing a full document-management system.

In fact, it’s probably easier than parsing BibTeX itself if you want to handle all the escapes, the name algorithm, and other complications).

3 Likes

Sure, but I am interested in writing and maintaining research papers and automatically managing the inter-dependency of mathematical and scientific knowledge, not just referencing a paper.

The topic of this thread is simply how to add citation information to Julia packages, not about document-management tools in general.

Yes, I understand… your functionality would be a subset of my functionality. If I finished my package you would have the feature you want and also a lot more.

In my opinion, the key argument for BibTeX is the widespread for production/consumption. We should not think about tooling for Julia, but more about tooling for the people using the citations.

Virtually all citation managers support BibTeX import and export, same goes for export from journal websites and various indexes (Google Scholar, Web of Science…). I think it’s a mistake to add any barriers/intermediate tooling between

  1. the citation shown to users that they can import into their citation manager of choice
  2. package developers or contributors who need to create and add a citation in the first place
6 Likes

My two cents,

Software and projects (packages as well as applications) need proper metadata for various reasons. I had argued towards including metadata in the Project.toml, but since that is being used exclusively for the package manager I believe a metadata format would be the best solution especially as it is a language agnostic solution.

Citations is just one of the many components of proper metadata. For example, if you want to use a project, an analysis of the licenses would be useful, but currently the metadata for licensing is not great. Likewise, for maintainers / support when exploring new solutions. After some research I believe the CodeMeta standard is the best solution out of those considered.

  • Comprehensive standard for metadata rather than just citations
  • Tools that support various integration services some which are under development
  • In my research it is quite useful to have OSS metadata for programmatically access
  • Seems like a standard that finally is experiencing general adoption across communities
  • It is trivial to obtain functionality for citation such as all_project_dependencies_cite() or all_manifest_dependencies_cite() to generate the string which can be written to a references.bib.

The CodeMeta thing looks promising. It might be possible to just write a package using metaprogramming to parse certain schemas from https://schema.org/, reducing maintenance burden as the standard is updated over time.

Many citation management tools, e.g Mendeley, JabRef, Zotero, RefWorks suport BibTeX import.
None of the above support importing CFF even though they support dozens of other formats.
(EndNote is the only mainstream citation manager I am aware of that can’t import BibTeX?)

That is fair.
If one uses Biber + BibLaTeX then you can use UTF8 encoded .bib files.
but since most people don’t it would be hard to push for people to use them.

CodeMeta could be good, but CFF seems kinda pointless.
I kinda think CodeMeta solves a different (more general) problem though.

How does CodeMeta handle the need for multile things to be there?
E.g. sometimes you have multiple citations for the same package,
for different parts

2 Likes

The VerTeX data format is a further abstraction from what you are all talking about, with further work all of your favorite citation formats can be supported and transplanted into the VerTeX database as well.

Also, the VerTeX package already has special features for supporting multiplicity of references and sub-references within other documents. In fact, that’s the whole point of chunking and decentralizing the data.

The whole point of VerTeX is to be able to reference a sub-document within a much larger body of inter-dependent scientific and mathematical documents. Textbooks written in \LaTeX have to be linear, while the information written in VerTeX is non-linear and generalized to arbitrary directed graphs of documents.

Any of the items can be a list, e.g. "citation": [ "http://doi.org/10.1109/JPROC.2004.840301", "https://doi.org/10.1145/989393.989457" ]

My central point is that we should use a standard metadata format if possible, not that we should go with something even more novel.

3 Likes

I agree with this, but because of a similar reasoning, I also understand why people are just OK use BibTeX—which (at this point) has been around for 3 decades—a bit more while they see which standard emerges as the dominant one.

Eg codemeta 1.0 was released in 2017. With all those major players behind it, I really hope it succeeds, but frankly, it’s fairly common for new “standards” to just appear and then die quietly in five years.

10 Likes

Was there ever a resolution to this? I know that @chakravala has put some great work forth and I think I remember seeing some form of LaTeX parsing out there at some point, but I still think it would be preferable to have something officially supported/incorporated into Pkg.jl.

2 Likes

Sorry for hijacking the discussion about selecting a format to build into Pkg. By the way, the VerTeX package I made does parse some LaTeX and it also is tied in with the Pkg.jl already also in some ways. It wasn’t intended to be ported into Pkg, but it is partially based on the Pkg.jl manager and is intended to eventually work very similar to Pkg.jl but for decentralized and collaborative LaTeX editing.

I think it would be unlikely that Julia Computing will want to officially collaborate on this with me, but all the ideas are there to make a Pkg.jl similar thing for LaTeX.

Again, apologies for hijacking the discussion, I understand that the Julia folks are mainly only interested in citations and not full collaborative LaTeX.

1 Like

No worries. I think it’s great that people are passionate enough about this to dig in and do something. I have opinions about what would work best but in the end I only care that I can:

  1. Access citations within my package programmatically somehow
  2. Do it once and not have to rewrite it everytime a new standard comes along
1 Like