Parsing with custom type JSON3 makes performance worse

Here is the sample JSON file:

"{\"topic\":\"trade.BTCUSDT\",\"data\":[{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"PlusTick\",\"price\":\"19431.00\",\"size\":0.2,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620275\",\"side\":\"Buy\",\"trade_id\":\"e6be9409-2886-5eb6-bec9-de01e1ec6bf6\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"MinusTick\",\"price\":\"19430.50\",\"size\":1.989,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620299\",\"side\":\"Sell\",\"trade_id\":\"bb706542-5d3b-5e34-8767-c05ab4df7556\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"ZeroMinusTick\",\"price\":\"19430.50\",\"size\":0.007,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620314\",\"side\":\"Sell\",\"trade_id\":\"a143da10-3409-5383-b557-b93ceeba4ca8\",\"is_block_trade\":\"false\"},{\"symbol\":\"BTCUSDT\",\"tick_direction\":\"PlusTick\",\"price\":\"19431.00\",\"size\":0.001,\"timestamp\":\"2022-10-18T14:50:20.000Z\",\"trade_time_ms\":\"1666104620327\",\"side\":\"Buy\",\"trade_id\":\"7bae9053-e42b-52bd-92c5-6be8a4283525\",\"is_block_trade\":\"false\"}]}"

I was under the impression if I give it a custom type it would make it faster but I was wrong. Here is the defined structure:

struct Ticket
    symbol::String
    tick_direction::String
    price::String
    size::Float64
    timestamp::String
    trade_time_ms::String
    side::String
    trade_id::String
    is_block_trade::String
end

struct Tape
    topic::String
    data::Vector{Ticket}
end

StructTypes.StructType(::Type{Tape}) = StructTypes.Struct()

Now with a simple JSON3.read(Sample) I get:

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.897 ΞΌs …  1.257 ms  β”Š GC (min … max):  0.00% … 99.68%
 Time  (median):     3.300 ΞΌs              β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   4.230 ΞΌs Β± 29.736 ΞΌs  β”Š GC (mean Β± Οƒ):  18.36% Β±  2.63%

  β–ƒβ–ƒβ–„β–‡β–ˆβ–ˆβ–‡β–‡β–…β–„β–‚β– ▁▂▁                                           β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–ƒβ–„β–ƒβ–†β–†β–†β–…β–†β–…β–†β–†β–…β–†β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–„β–†β–…β–„β–…β–†β–„β–…β–ƒβ–ƒβ–„β–„β–ƒβ–„β–„β–† β–ˆ
  2.9 ΞΌs       Histogram: log(frequency) by time     6.98 ΞΌs <

 Memory estimate: 4.38 KiB, allocs estimate: 7.


With the custom type defined JSON3.read(Sample, Tape) I get:

BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.813 ΞΌs … 763.641 ΞΌs  β”Š GC (min … max): 0.00% … 98.76%
 Time  (median):     6.977 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   7.479 ΞΌs Β±  12.962 ΞΌs  β”Š GC (mean Β± Οƒ):  2.98% Β±  1.71%

  β–‡β–ˆβ–†β–…β–‚β–‚β–             ▁▁                                      β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–ˆβ–‡β–ˆβ–‡β–†β–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–…β–‡β–…β–‡β–ˆβ–ˆβ–‡β–…β–…β–…β–†β–‡β–…β–„β–ƒβ–…β–…β–β–β–β–„β–β–β–β–β–ƒβ–β–„β–„β–β–„β–„β–…β–„ β–ˆ
  6.81 ΞΌs      Histogram: log(frequency) by time      13.5 ΞΌs <

 Memory estimate: 3.42 KiB, allocs estimate: 48.


Shouldn’t giving hints about the structure of the json file make it more performant? Why Is it regressing?

Hmmmm, I wouldn’t expect the typed parsing to be that slow, so maybe something has regressed there in terms of performance. But note that the JSON3.read(json) method is pretty heavily optimized and will perform well on nested json compared to a typed case. I’m not sure that completely explains the perf results here since it doesn’t seem that heavily nested, but it might be. If you use the StatProfilerHTML package, we could get flamegraph profiles of the two approaches and see if something seems obviously wrong in the typed case.

2 Likes

Here is the simple JSON3.read :

And here is the typed version:

I wrapped them in a function to repeat multiple times for the profiler to pick up.

Looks like it is just looking up a large number of symbols. A symbol is an interned string and when looking them up you have to look through all the symbols that exist in the Julia session. There are probably some quite easy optimizations that can be made, like having a local Dict{String, Symbol} or just never materializing the symbol at all.

1 Like

Very helpful thank you. Yes, as @kristoffer.carlsson mentioned, it looks like there’s room for optimization here. Would you mind opening an issue on the JSON3.jl repo and I’ll try to take a look at improving things?

1 Like

Yeah no problem, I’ll open one.

Thanks for the repIy. I am a little inexperienced so sorry for the noob question. I searched and got more confuse, How can I implement this, is it something that should be done inside the JSON3 package? If you point to some sample code out there I’ll be very thankful.

Wait what, does Julia not have a Dict of Symbols? Why not?