Thoughts on improvements to JSON parsing


#1

I’ve been thinking a lot recently about how to improve both the performance of parsing JSON (it is fairly critical in infrastructure these days, used in REST interfaces, for configuration files, for large data sets, and even used to store data in databases [sometimes in “binary” formats, such as in MongoDB and Postgres])

I’ve written fast JSON parsers in the past, and it fits in well with three areas I love in programming:
performance, databases, and string handling.

The current JSON parser allows one to set what type of Dict one uses to store the data, and what type to use for integers.
Currently, the JSON parser always returns JSON Objects as some associative type (whatever you pass in with the dicttype keyword) otherwise Dict{String, Any}, and “integers” (i.e. no ., e, or E, even if the value is really still an integer, for example 1e10) as being of type Int (if not passed in with the inttype keyword). JSON Arrays are always returned as Vector{Any}, even when a much more performant type could be used.

Since there is no guarantee of type stability, I’d like to have it pick the best sized built-in type for the value
(as long as there is no loss of information), and otherwise, return something that does preserve numbers exactly as they are in the JSON file. This is important if you want to ingest a JSON file, process certain parts,
and then output the processed file, without changing values that could not be represented with the built-in types in Julia.

For integers, that would be Int8, … Int128 if signed, UInt8, … UInt128 if unsigned (and won’t fit in the same size Int*, promoting if in a JSON Array, but only up to possibly only to Int64 or UInt64, and then use a Union{Int64, BigInt} or Union{Int64, UInt64, BigInt} to take advantage of small union type handling on master.
Floating point values are tricker, JSON allows exponents of any number of digits, it might even be useful to make a JSNumber type, to handle cases where IEEE floats and even BigFloat cannot store the number,
and have convert functions to change the JSNumber into any of the normal floating point types (binary or decimal), and allow a keyword to say if you simply want some specific floating point type (which might return Inf or -Inf if the JSON number cannot otherwise be represented [but then you lose the ability to output the JSON file without losing information])

I just found out about this: https://github.com/JuliaIO/JSON.jl/issues/235#issuecomment-362880624, about lazy parsing of a JSON file, by @samoconnor, and this: JSON2, another great job by @quinnj!
(I really want to get both of their opinions on this!)