I’m pleased to share the registration of a new package, JSON3.jl, in the General registry, available immediately.
Let’s cut right to the chase and answer the elephant questions in the proverbial discourse room: why do we need another JSON package in Julia? what does it offer distinct from what JSON.jl, JSON2.jl, or LazyJSON.jl offer? why spend time and effort developing something that’s “already solved”?
JSON3.jl was born from the spark of three separate ideas, and a vision that they could come together to make the best, most performant, simple, yet powerful JSON integration for Julia possible. It also exists as a way to “prove out” these ideas before trying to potentially upstream improvements into a more canonically named package like JSON.jl. I fully believe the package is ready for full-time use and reliance, but similar to JSON2.jl, it exists as a way to try out a different JSON integration API to potentially make things better, faster, easier.
Semi-Lazy Native Parsing
Taking a lazy approach to JSON parsing is not a new concept, (see LazyJSON.jl), but currently the approach in LazyJSON.jl has two slight disadvantages: 1) the initial object wrapping has high overhead for small JSON objects, and 2) the completely lazy approach to accessing key-value pairs introduces overhead when iterating an entire object. (see below for performance comparisons). JSON3.jl takes a semi-lazy parsing approach, where objects, arrays, and strings are parsed lazily, while numbers, booleans, and null values are parsed immediately. In addition, each object, array, and string carries its total self-length, making access of individual key-value pairs or iteration slightly faster by allowing the ability to skip over entire objects/arrays/strings.
Another powerful advantage of the semi-lazy approach in JSON3.jl is the ability to get strongly-typed JSON3.Array{T}
when parsing JSON arrays. (Note that JSON3.Array{T}
is a lazy, JSON3-defined type different from Base.Array{T, N}
). The semi-lazy parsing approach allows JSON3.jl to identify homogenous arrays and “flag” the type with the concrete type that is parsed, which, when combined with usage of the array later, allows the Julia compiler to generate extremely efficient code (see array iteration benchmarks below).
This technique/feature is accessible in JSON3 via “native parsing” by calling JSON3.read(json_str)
; for strings, numbers, booleans, or null values, the direct value will be returned; for objects and arrays, custom JSON3.Object
and JSON3.Array{T}
objects will be returned, which employ the semi-lazy approach discussed. JSON3.Object
implements the AbstractDict
interface (acts like a Dict
), and also allows for accessing key-value pairs via getproperty
, like JavaScript, (i.e. you can do obj.keyname
). JSON3.Array{T}
implements the AbstractArray
interface, so supports the normal iteration, getindex
, etc.
Compiler-friendly Custom Struct Code Generation
In JSON2.jl, I wanted to provide really fast ways to read/write custom Julia structs for JSON. I took a bit of a “shotgun” approach by trying out at least 3 ways, using combinations of introspection, macros, and heavy use of @generated
functions to generate struct-specific code. While the goal was achieved in simpler cases, the code was extremely complex, hard to maintain/edit, and inscrutable to those wishing to contribute. What’s more, there were a few worst-case scenarios where the code generation would get out of hand leading to entire application pauses because JSON2.jl was over-compiling for some crazy struct.
In JSON3.jl, the custom struct support has been overhauled to to be drastically simpler, achieve excellent performance, and avoid worst-case compiling scenarios; techniques utilized include:
- relying on the compiler’s excellent capabilities to do struct introspection at compile-time
- utilize similar techniques to
Base.CartesianIndex
for simple, straightforward code generation usingBase.@nexpr
andBase.@ncall
- introduce code generation limits, specializing structs with < 32 fields, with fallbacks to handle larger cases
The equivalent code is several hundreds line smaller, more performant, understandable, and avoids any compiler danger zones.
A Novel, More Julian, Approach to Struct Mapping
The JSON3.jl approach to declaring how your struct should map to JSON begins with the assumption that every struct falls into one of two general categories: a “data” type or an “interface” type. “Data” types are defined as being basically a collection of properties that make up an object; the type exists to bundle related fields together to be operated on and that generally have some kind of semantic value when bundled together. Their natural JSON representation is as a JSON object where each field name is treated as a JSON key, and each field value as the corresponding JSON value. “Interface” types, on the other hand, have private, internal fields, and are mainly useful via the access patterns they define; many Base
or library-provided structs are like this. For example, Base.Dict
has several internal fields that are mostly cryptic if viewed on their own, but with powerful interface methods like getindex
, setindex!
, iteration of key-value pairs, the Dict
provides a meaningful implementation of the “hash table” data structure. To map these kinds of structs to JSON, we definitely don’t want to consider their internal fields, but want to map them to one of the existing JSON object types: object, array, string, number, boolean, or null.
JSON3.jl defines a trait-based approach to conveniently declare the “JSON struct type” of a custom struct, using one of the following traits:
# data types
JSON3.StructType(::Type{T}) = JSON3.Struct()
JSON3.StructType(::Type{T}) = JSON3.Mutable()
# json types for interface types
JSON3.StructType(::Type{T}) = JSON3.ObjectType()
JSON3.StructType(::Type{T}) = JSON3.ArrayType()
JSON3.StructType(::Type{T}) = JSON3.StringType()
JSON3.StructType(::Type{T}) = JSON3.NumberType()
JSON3.StructType(::Type{T}) = JSON3.BoolType()
JSON3.StructType(::Type{T}) = JSON3.NullType()
# subtype dispatch for abstract types
JSON3.StructType(::Type{T}) = JSON3.AbstractType()
“Data” types will use one of the Struct
or Mutable
traits, while “interface” types will declare one of the JSON object types, and ensure they satisfy the required interface. The JSON3.AbstractType
trait is for a specialized JSON reading scenario where the type of a JSON object is included as a key-value pair in the object itself, so a sort of “subtype dispatch” should be used to map JSON to the correct Julia struct.
Full documentation is provided for each JSON3.StructType
trait, but please raise issues if something isn’t clear.
Sorry for the diatribe here, but hopefully it’s useful to here a little bit of context/background going into why another JSON package is being registered and publicized.
Benchmarks
LazyJSON.jl vs. JSON3.jl
Small object parse and iterate over each key-value pair:
julia> str = """{
"a": 1,
"b": 2,
"c": 3
}
"""
"{\n\"a\": 1,\n\"b\": 2,\n\"c\": 3\n}\n"
julia> @btime LazyJSON.value(str)
199.637 ns (1 allocation: 32 bytes)
LazyJSON.Object{Nothing,String} with 3 entries:
"a" => 1
"b" => 2
"c" => 3
julia> using JSON3
julia> @btime JSON3.read(str)
108.741 ns (3 allocations: 384 bytes)
JSON3.Object{Base.CodeUnits{UInt8,String}} with 3 entries:
:a => 1
:b => 2
:c => 3
julia> function access_each(obj)
x = 0
for (k, v) in obj
x += v
end
return x
end
access_each (generic function with 1 method)
julia> v = LazyJSON.value(str)
LazyJSON.Object{Nothing,String} with 3 entries:
"a" => 1
"b" => 2
"c" => 3
julia> @btime access_each(v)
1.723 μs (21 allocations: 672 bytes)
6
julia> v2 = JSON3.read(str)
JSON3.Object{Base.CodeUnits{UInt8,String}} with 3 entries:
:a => 1
:b => 2
:c => 3
julia> @btime access_each(v2)
1.402 μs (0 allocations: 0 bytes)
6
Sum elements of a number array:
julia> str = "[1,2,3,4,5,6,7,8,9,10]"
"[1,2,3,4,5,6,7,8,9,10]"
julia> a = LazyJSON.value(str)
10-element LazyJSON.Array{Nothing,String}:
1
2
3
4
5
6
7
8
9
10
julia> a2 = JSON3.read(str)
10-element JSON3.Array{Int64,Base.CodeUnits{UInt8,String}}:
1
2
3
4
5
6
7
8
9
10
julia> function access_each(arr)
x = 0
for v in arr
x += v
end
return x
end
access_each (generic function with 1 method)
julia> @btime access_each(a)
5.398 μs (50 allocations: 1.56 KiB)
55
julia> @btime access_each(a2)
26.463 ns (0 allocations: 0 bytes)
55