I noticed that @kristoffer.carlsson has been working on parsing performance in JSON.jl: A few performance improvements by KristofferC · Pull Request #263 · JuliaIO/JSON.jl · GitHub … and it got me thinking.
When I first created LazyJSON.jl I targeted Julia 0.7-dev at a time when there were a lot of string changes going on. At the time performance comparisons with JSON.jl and JSON2.jl would vary wildly between Julia 0.7-dev nightly builds as various deprecation penalties came and went. Now that Julia 1.0 is out and JSON.jl and JSON2.jl have been updated for Julia 1.0, it seems like a good time to compare performance again.
These tests compare LazyJSON.jl to @kristoffer.carlsson’s kc/opt
branch and JSON2.jl v0.2.3.
Test code is here: LazyJSON.jl/benchmark.jl at master · JuliaCloud/LazyJSON.jl · GitHub
LazyJSON.jl seems to take about the same time as JSON.jl to do flat non-lazy parsing to Julia Dict/Array etc (sometimes a bit faster, sometimes a bit slower).
In lazy mode LazyJSON.jl is orders of magnitude faster when only part of the input data is used.
@fengyang.wang what are your thoughts on either adding lazy parsing as an option in JSON.jl or replacing the current parser with a lazy parser? (One thing that would have to be done is to add a validation option to do strict JSON syntax checking for use cases where that is required. As it stands the Lazy parser will ignore some JSON syntax errors due to lazyness).
@quinnj Can you comment on the JSON2 results? I know that JSON2 is optimised for marshalling/unmarshalling, so perhaps my tests that result in JSON2 returning NamedTuples
are a degenerate case. In test6 I tried to test JSON2’s direct-to-struct parsing in a way that seems to me to be “how JSON2 is intended to be used”, but it still seems a bit slow. Perhaps you can suggest a test case that would best demonstrate JSON2’s strengths.
@Nosferican I believe that you have been using LazyJSON.jl in NCEI.jl
. Do you have any feedback from your use of the package?
test1
Reads ec2-2016-11-15.normal.json
and extracts a single value:
operations.AcceptReservedInstancesExchangeQuote.input.shape
.
This value is close to the start of the input data.
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface. - Lazy (B): LazyJSON.jl
getproperty
interface. - Lazy (C): LazyJSON.jl
lazy=false
(parse whole input to Dicts etc like JSON.jl does) - JSON: JSON.jl
parse
interface. - JSON2: JSON2.jl
read -> NamedTuple
interface.
results = 5×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼──────────┼────────┼───────────┼───────────┼──────────┤
│ 1 │ test1 │ Lazy │ 54 │ 5184 │ 269 │ 0 │
│ 2 │ test1 │ Lazy (B) │ 51 │ 5408 │ 277 │ 0 │
│ 3 │ test1 │ Lazy (C) │ 105628 │ 51409504 │ 980424 │ 300 │
│ 4 │ test1 │ JSON │ 103870 │ 50429936 │ 491747 │ 510 │
│ 5 │ test1 │ JSON2 │ 609448 │ 147471280 │ 4162257 │ 890 │
Note: LazyJSON.jl is similar to JSON.jl in speed and memory use in non-lazy mode.
test2
Read ec2-2016-11-15.normal.json
and extracts an array value:
shapes.scope.enum
This value is close to the end of the input data.
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface. - Lazy (B): LazyJSON.jl
getproperty
interface. - Lazy (C): LazyJSON.jl
lazy=false
(parse whole input to Dicts etc) - JSON: JSON.jl
parse
interface. - JSON2: JSON2.jl
read -> NamedTuple
interface.
results = 5×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼──────────┼────────┼───────────┼───────────┼──────────┤
│ 1 │ test2 │ Lazy │ 11035 │ 3296 │ 162 │ 0 │
│ 2 │ test2 │ Lazy (B) │ 11028 │ 3440 │ 168 │ 0 │
│ 3 │ test2 │ Lazy (C) │ 115045 │ 51409600 │ 980426 │ 300 │
│ 4 │ test2 │ JSON │ 91334 │ 50429936 │ 491747 │ 510 │
│ 5 │ test2 │ JSON2 │ 605269 │ 147471280 │ 4162257 │ 890 │
Note: It takes LazyJSON.jl a bit longer to access values near the end of
the input.
test3
Modifes ec2-2016-11-15.normal.json
by replacing a value near the
start of the file and two values near the end.
Variants:
- Lazy: LazyJSON.jl
getproperty
interface finds values and
LazyJSON.splice
modifies the JSON data in-place. - JSON: JSON.jl
parse
toDict
, modify, then write new JSON text. - JSON2: Parses to immutable
NamedTuples
. Modificaiton not supported.
results = 2×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼─────────┼────────┼───────────┼───────────┼──────────┤
│ 1 │ test3 │ Lazy │ 235024 │ 880768 │ 33622 │ 0 │
│ 2 │ test3 │ JSON │ 671735 │ 126950528 │ 1407838 │ 1021 │
test4
Reads a 1.2MB GeoJSON file an extracts a country name near the middle
of the file.
Variants:
- Lazy:
LazyJSON.parse(j)["features"][15]["properties"]["formal_en"]
- Lazy (B):
LazyJSON.parse(j; getproperty=true).features[15].properties.formal_en
- Lazy (C):
LazyJSON.parse(j; lazy=false)["features"][15]["properties"]["formal_en"]
- JSON:
JSON.parse(j)["features"][15]["properties"]["formal_en"]
- JSON2: J
JSON2.read(j).features[15].properties.formal_en
results = 5×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼──────────┼───────┼──────────┼───────────┼──────────┤
│ 1 │ test4 │ Lazy │ 310 │ 2288 │ 115 │ 0 │
│ 2 │ test4 │ Lazy (B) │ 312 │ 2432 │ 121 │ 0 │
│ 3 │ test4 │ Lazy (C) │ 40696 │ 13134624 │ 462247 │ 48 │
│ 4 │ test4 │ JSON │ 41609 │ 6336752 │ 135146 │ 100 │
│ 5 │ test4 │ JSON2 │ 84167 │ 22868160 │ 477011 │ 48 │
Note: LazyJSON.jl in non-lazy mode is a bit faster than JSON.jl for this
input.
test5
Reads a 1.2MB GeoJSON file and checks that the outline polygon for
a single country is within an expected lat/lon range.
r = r["features"][15]["geometry"]["coordinates"][6][1]
@assert r[1][1] == 134.41651451900023
for (x, y) in r
@assert 134.2 < x < 134.5
@assert 7.21 < y < 7.32
end
results = 3×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼─────────┼───────┼──────────┼───────────┼──────────┤
│ 1 │ test5 │ Lazy │ 399 │ 22992 │ 967 │ 0 │
│ 2 │ test5 │ JSON │ 40635 │ 6340592 │ 135296 │ 100 │
│ 3 │ test5 │ JSON2 │ 81213 │ 22872000 │ 477161 │ 48 │
test6
Defines struct Operation
, struct IOType
and struct HTTP
with
fields that match the API operations data in ec2-2016-11-15.normal.json
.
It then does JSON2-style direct-to-struct parsing to read the JSON data
into a Julia object Dict{String,Operation}
(LazyJSON provides @generated
Base.convert
methods for this).
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface.
convert(Dict{String,Operation}, LazyJSON.parse(j))
- JSON2: JSON2.jl
read -> NamedTuple
interface.
JSON2.read(j, Dict{String,Operation})
results = 2×6 DataFrame
│ Row │ Test │ Variant │ μs │ bytes │ poolalloc │ bigalloc │
├─────┼───────┼─────────┼───────┼─────────┼───────────┼──────────┤
│ 1 │ test6 │ Lazy │ 6866 │ 1125600 │ 39538 │ 16 │
│ 2 │ test6 │ JSON2 │ 13096 │ 3427888 │ 135789 │ 60 │
Note:
For all of the above tests, the content of ec2-2016-11-15.normal.json
has been
duplicated 10 times into a top level JSON array “[ , , , …]” this
results in an overall input data size of ~10MB.