I noticed that @kristoffer.carlsson has been working on parsing performance in JSON.jl: https://github.com/JuliaIO/JSON.jl/pull/263/files β¦ and it got me thinking.
When I first created LazyJSON.jl I targeted Julia 0.7-dev at a time when there were a lot of string changes going on. At the time performance comparisons with JSON.jl and JSON2.jl would vary wildly between Julia 0.7-dev nightly builds as various deprecation penalties came and went. Now that Julia 1.0 is out and JSON.jl and JSON2.jl have been updated for Julia 1.0, it seems like a good time to compare performance again.
These tests compare LazyJSON.jl to @kristoffer.carlssonβs kc/opt
branch and JSON2.jl v0.2.3.
Test code is here: https://github.com/samoconnor/LazyJSON.jl/blob/master/test/benchmark.jl
LazyJSON.jl seems to take about the same time as JSON.jl to do flat non-lazy parsing to Julia Dict/Array etc (sometimes a bit faster, sometimes a bit slower).
In lazy mode LazyJSON.jl is orders of magnitude faster when only part of the input data is used.
@fengyang.wang what are your thoughts on either adding lazy parsing as an option in JSON.jl or replacing the current parser with a lazy parser? (One thing that would have to be done is to add a validation option to do strict JSON syntax checking for use cases where that is required. As it stands the Lazy parser will ignore some JSON syntax errors due to lazyness).
@quinnj Can you comment on the JSON2 results? I know that JSON2 is optimised for marshalling/unmarshalling, so perhaps my tests that result in JSON2 returning NamedTuples
are a degenerate case. In test6 I tried to test JSON2βs direct-to-struct parsing in a way that seems to me to be βhow JSON2 is intended to be usedβ, but it still seems a bit slow. Perhaps you can suggest a test case that would best demonstrate JSON2βs strengths.
@Nosferican I believe that you have been using LazyJSON.jl in NCEI.jl
. Do you have any feedback from your use of the package?
test1
Reads ec2-2016-11-15.normal.json
and extracts a single value:
operations.AcceptReservedInstancesExchangeQuote.input.shape
.
This value is close to the start of the input data.
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface. - Lazy (B): LazyJSON.jl
getproperty
interface. - Lazy (C): LazyJSON.jl
lazy=false
(parse whole input to Dicts etc like JSON.jl does) - JSON: JSON.jl
parse
interface. - JSON2: JSON2.jl
read -> NamedTuple
interface.
results = 5Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌβββββββββββΌβββββββββΌββββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test1 β Lazy β 54 β 5184 β 269 β 0 β
β 2 β test1 β Lazy (B) β 51 β 5408 β 277 β 0 β
β 3 β test1 β Lazy (C) β 105628 β 51409504 β 980424 β 300 β
β 4 β test1 β JSON β 103870 β 50429936 β 491747 β 510 β
β 5 β test1 β JSON2 β 609448 β 147471280 β 4162257 β 890 β
Note: LazyJSON.jl is similar to JSON.jl in speed and memory use in non-lazy mode.
test2
Read ec2-2016-11-15.normal.json
and extracts an array value:
shapes.scope.enum
This value is close to the end of the input data.
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface. - Lazy (B): LazyJSON.jl
getproperty
interface. - Lazy (C): LazyJSON.jl
lazy=false
(parse whole input to Dicts etc) - JSON: JSON.jl
parse
interface. - JSON2: JSON2.jl
read -> NamedTuple
interface.
results = 5Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌβββββββββββΌβββββββββΌββββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test2 β Lazy β 11035 β 3296 β 162 β 0 β
β 2 β test2 β Lazy (B) β 11028 β 3440 β 168 β 0 β
β 3 β test2 β Lazy (C) β 115045 β 51409600 β 980426 β 300 β
β 4 β test2 β JSON β 91334 β 50429936 β 491747 β 510 β
β 5 β test2 β JSON2 β 605269 β 147471280 β 4162257 β 890 β
Note: It takes LazyJSON.jl a bit longer to access values near the end of
the input.
test3
Modifes ec2-2016-11-15.normal.json
by replacing a value near the
start of the file and two values near the end.
Variants:
- Lazy: LazyJSON.jl
getproperty
interface finds values and
LazyJSON.splice
modifies the JSON data in-place. - JSON: JSON.jl
parse
toDict
, modify, then write new JSON text. - JSON2: Parses to immutable
NamedTuples
. Modificaiton not supported.
results = 2Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌββββββββββΌβββββββββΌββββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test3 β Lazy β 235024 β 880768 β 33622 β 0 β
β 2 β test3 β JSON β 671735 β 126950528 β 1407838 β 1021 β
test4
Reads a 1.2MB GeoJSON file an extracts a country name near the middle
of the file.
Variants:
- Lazy:
LazyJSON.parse(j)["features"][15]["properties"]["formal_en"]
- Lazy (B):
LazyJSON.parse(j; getproperty=true).features[15].properties.formal_en
- Lazy (C):
LazyJSON.parse(j; lazy=false)["features"][15]["properties"]["formal_en"]
- JSON:
JSON.parse(j)["features"][15]["properties"]["formal_en"]
- JSON2: J
JSON2.read(j).features[15].properties.formal_en
results = 5Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌβββββββββββΌββββββββΌβββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test4 β Lazy β 310 β 2288 β 115 β 0 β
β 2 β test4 β Lazy (B) β 312 β 2432 β 121 β 0 β
β 3 β test4 β Lazy (C) β 40696 β 13134624 β 462247 β 48 β
β 4 β test4 β JSON β 41609 β 6336752 β 135146 β 100 β
β 5 β test4 β JSON2 β 84167 β 22868160 β 477011 β 48 β
Note: LazyJSON.jl in non-lazy mode is a bit faster than JSON.jl for this
input.
test5
Reads a 1.2MB GeoJSON file and checks that the outline polygon for
a single country is within an expected lat/lon range.
r = r["features"][15]["geometry"]["coordinates"][6][1]
@assert r[1][1] == 134.41651451900023
for (x, y) in r
@assert 134.2 < x < 134.5
@assert 7.21 < y < 7.32
end
results = 3Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌββββββββββΌββββββββΌβββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test5 β Lazy β 399 β 22992 β 967 β 0 β
β 2 β test5 β JSON β 40635 β 6340592 β 135296 β 100 β
β 3 β test5 β JSON2 β 81213 β 22872000 β 477161 β 48 β
test6
Defines struct Operation
, struct IOType
and struct HTTP
with
fields that match the API operations data in ec2-2016-11-15.normal.json
.
It then does JSON2-style direct-to-struct parsing to read the JSON data
into a Julia object Dict{String,Operation}
(LazyJSON provides @generated
Base.convert
methods for this).
Variants:
- Lazy: LazyJSON.jl
AbstractDict
interface.
convert(Dict{String,Operation}, LazyJSON.parse(j))
- JSON2: JSON2.jl
read -> NamedTuple
interface.
JSON2.read(j, Dict{String,Operation})
results = 2Γ6 DataFrame
β Row β Test β Variant β ΞΌs β bytes β poolalloc β bigalloc β
βββββββΌββββββββΌββββββββββΌββββββββΌββββββββββΌββββββββββββΌβββββββββββ€
β 1 β test6 β Lazy β 6866 β 1125600 β 39538 β 16 β
β 2 β test6 β JSON2 β 13096 β 3427888 β 135789 β 60 β
Note:
For all of the above tests, the content of ec2-2016-11-15.normal.json
has been
duplicated 10 times into a top level JSON array β[ , , , β¦]β this
results in an overall input data size of ~10MB.