Parsing with custom type JSON3 makes performance worse

Robert_J · October 20, 2022, 6:38pm

Here is the sample JSON file:

"{\"topic\":\"trade.BTCUSDT\",\"data\":[{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"PlusTick\",\"price\":\"19431.00\",\"size\":0.2,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620275\",\"side\":\"Buy\",\"trade_id\":\"e6be9409-2886-5eb6-bec9-de01e1ec6bf6\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"MinusTick\",\"price\":\"19430.50\",\"size\":1.989,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620299\",\"side\":\"Sell\",\"trade_id\":\"bb706542-5d3b-5e34-8767-c05ab4df7556\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"ZeroMinusTick\",\"price\":\"19430.50\",\"size\":0.007,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620314\",\"side\":\"Sell\",\"trade_id\":\"a143da10-3409-5383-b557-b93ceeba4ca8\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"PlusTick\",\"price\":\"19431.00\",\"size\":0.001,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620327\",\"side\":\"Buy\",\"trade_id\":\"7bae9053-e42b-52bd-92c5-6be8a4283525\",\"is_block_trade\":\"false\"}]}"

I was under the impression if I give it a custom type it would make it faster but I was wrong. Here is the defined structure:

struct Ticket
    symbol::String
    tick_direction::String
    price::String
    size::Float64
    timestamp::String
    trade_time_ms::String
    side::String
    trade_id::String
    is_block_trade::String
end

struct Tape
    topic::String
    data::Vector{Ticket}
end

StructTypes.StructType(::Type{Tape}) = StructTypes.Struct()

Now with a simple JSON3.read(Sample) I get:

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.897 μs …  1.257 ms  ┊ GC (min … max):  0.00% … 99.68%
 Time  (median):     3.300 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.230 μs ± 29.736 μs  ┊ GC (mean ± σ):  18.36% ±  2.63%

  ▃▃▄▇██▇▇▅▄▂▁ ▁▂▁                                           ▂
  █████████████████▅▃▄▃▆▆▆▅▆▅▆▆▅▆▇▇▇▇▇████▇▇▄▆▅▄▅▆▄▅▃▃▄▄▃▄▄▆ █
  2.9 μs       Histogram: log(frequency) by time     6.98 μs <

 Memory estimate: 4.38 KiB, allocs estimate: 7.

With the custom type defined JSON3.read(Sample, Tape) I get:

BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.813 μs … 763.641 μs  ┊ GC (min … max): 0.00% … 98.76%
 Time  (median):     6.977 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.479 μs ±  12.962 μs  ┊ GC (mean ± σ):  2.98% ±  1.71%

  ▇█▆▅▂▂▁             ▁▁                                      ▂
  ████████▇▇█▇█▇█▇▆▆▇████▇▇▅▇▅▇██▇▅▅▅▆▇▅▄▃▅▅▁▁▁▄▁▁▁▁▃▁▄▄▁▄▄▅▄ █
  6.81 μs      Histogram: log(frequency) by time      13.5 μs <

 Memory estimate: 3.42 KiB, allocs estimate: 48.

Shouldn’t giving hints about the structure of the json file make it more performant? Why Is it regressing?

quinnj · October 20, 2022, 10:04pm

Hmmmm, I wouldn’t expect the typed parsing to be that slow, so maybe something has regressed there in terms of performance. But note that the JSON3.read(json) method is pretty heavily optimized and will perform well on nested json compared to a typed case. I’m not sure that completely explains the perf results here since it doesn’t seem that heavily nested, but it might be. If you use the StatProfilerHTML package, we could get flamegraph profiles of the two approaches and see if something seems obviously wrong in the typed case.

Robert_J · October 21, 2022, 9:12am

Here is the simple JSON3.read :

And here is the typed version:

I wrapped them in a function to repeat multiple times for the profiler to pick up.

kristoffer.carlsson · October 21, 2022, 10:57am

Looks like it is just looking up a large number of symbols. A symbol is an interned string and when looking them up you have to look through all the symbols that exist in the Julia session. There are probably some quite easy optimizations that can be made, like having a local Dict{String, Symbol} or just never materializing the symbol at all.

quinnj · October 21, 2022, 5:44pm

Very helpful thank you. Yes, as @kristoffer.carlsson mentioned, it looks like there’s room for optimization here. Would you mind opening an issue on the JSON3.jl repo and I’ll try to take a look at improving things?

Robert_J · October 21, 2022, 8:04pm

Yeah no problem, I’ll open one.

Thanks for the repIy. I am a little inexperienced so sorry for the noob question. I searched and got more confuse, How can I implement this, is it something that should be done inside the JSON3 package? If you point to some sample code out there I’ll be very thankful.

cjdoris · October 21, 2022, 8:56pm

Wait what, does Julia not have a Dict of Symbols? Why not?

Topic		Replies	Views
[ANN] JSON3.jl - Yet another JSON package for Julia Package Announcements	23	10633	September 19, 2020
Fastest JSON parser to julia Specific Domains ccall , json , cwrap	15	3556	November 6, 2020
Reading a nested JSON file into a predefined type with JSON3 New to Julia	2	376	October 20, 2022
JSON3 to struct, how to improve deserialization performance? Performance question , package , performance , json3	6	483	March 20, 2024
JSON3 Package: ArgumentError: invalid JSON at byte .... : InvalidChar? New to Julia json	1	1783	September 21, 2020

Parsing with custom type JSON3 makes performance worse

Related topics