[ANN] TypedJSON.jl - A Julia serialization library prioritizing type fidelity, human-readability, and long-term archival

Dear all,
this is an announcement for TypedJSON.jl, (yet another!) JSON serialization library focusing on type fidelity and long-term archival.

It came out of my specific needs to:

  • Easily serialize/deserialize from/to Julia structures, with limited additional boilerplate and a clear way to support new structures;
  • Be absolutely sure that even in 10 years from now (i.e. when the structure definition has changed…) I can read back my data;
  • Be able to read the same data from Python, javascript, etc.

There likely is room for improvements, so comments / suggestions are appreciated!

8 Likes

Thanks for writing this! I frequently run into the issue of having to reconstruct composite types after they have changed, so this will be a nice addition to my toolkit.

I don’t mind having to write a reconstruction method, I think of this as a feature. I like the human-readable output.

2 Likes

Do you have an example of what TypedJSON.jl can do that JSON.jl v1 cannot? I was not able to gather that from the README.

1 Like

And some examples of the JSON it produces would be helpful too.

1 Like

This is pretty vibe-coded, right?

If so, then I think we currently ask for disclosure in the Readme.md.

Some general notes:

  1. Afaiu you don’t round-trip element types for arrays → you lose fidelity and performance.
  2. You decide to serialize “difficult” objects like Function as nothing. With your intended use for long-time storage (i.e. allow a human to use the data, long after all code + docs + first hand experience handling it has been lost), I think some configurable warning would be appropriate whenever your serialization discards data. It would suck for future people to come in to pick up a long-discarded project or help a replication attempt, only to discover that all data is lost because it lived in a closure.
  3. I have not seen an obvious way to pop a shell on deserialization of malicious data. However, the general architecture is a recipe for insecure deserialization – maybe explicitly document that people shouldn’t reconstruct / deserialize untrusted input?

The general architecture that invites shell-popping: Types for reconstruction use a global registry, namely the dispatch table for specializations of TypedJson.reconstruct(::Val{<:Symbol}, ::Any), where the symbol is attacker controlled. Hence, each loaded package that defines reconstructions gives new gadgets. This is similar to deserialization gadgets in java, where any unforeseen interaction between stuff in the classpath can make deserialization unsafe. The general consensus is that “control your classpath such that no gadget chain exists” is not viable (it breaks all modularity!), and java deserialization must be considered unsafe in any environment.

With this general architecture, you will never achieve safe / secure deserialization. Which is totally fine if it is not your design goal, just be very upfront about it in your Readme.md.

1 Like

Sorry, I overlooked this aspect…
I added a new section in the README: Compare with JSON.jl

Good point, thanks!
I added a new section: How do the “typed JSON” looks like?

The package code is not AI-generated, while the test suite is mainly provided by Gemini.

In some case the types are not properly reconstructed, I added a note in the Caveats section.
Performance is not a goal here…

The only way to check for correct serialization and deserialization is to use the roundtrip function. I added a clear statement in the Caveats section.

This is a very interesting topic, thank you for raising it!
But it’s not clear to me how one can perform shell-popping here by just providing a malicious JSON file to deserialize. Can you provide an explicit example?

As long as I understand you need both the JSON file and a malicious package to do the job, hence the problem becomes to warn the user not to install untrusted packages… Or am I missing something?