Fastest JSON parser to julia

Hey Julianners,

I just realised a speed comparison here: https://www.reddit.com/r/programming/comments/3pojrz/the_fastest_json_parser_in_the_world/
https://github.com/kostya/benchmarks#json
where smidjson is one of the fastest JSON parser.

I tested some parsing time and I realised while the parsing time was:
in Julia JSON2: 0.052s
in C++ smidjson: 0.000665s
which is much faster.

I just read there is a possibility to call C++ call from julia: GitHub - JuliaInterop/CxxWrap.jl: Package to make C++ libraries available in Julia

And I just thinking… is it possible to call this C++ json read from Julia?
Because if so, then basically we don’t even have to get a new JSON parser for Julia, we just need the library that is able to call the C++ function and transpile the C++ class.

Did anyone know how to make this to work?

I’d try JSON3 first, which should be the fastest Julia native parser.

5 Likes

JSON2 is basically deprecated in favour of JSON3.
They are by the same author.

Also make sure you are not counting compile time.
(recoomend using BenchmarkTools.@btime if you are not already)
(or if you want to count complle time, make sure you count it for C also :stuck_out_tongue:)

1 Like

I am not measuring compilation speed, I test it with the benchmark tools also.
I tested JSON3, but barely no enhancement.
About 25-30% or so in my case.

But you can also check the result of the benchmark site I linked: https://github.com/kostya/benchmarks#json
It also shows 10x speed difference, which in my case is close to 100x.

That is why I am asking why don’t we just call the smidjson from C++ and take advantage of that someone optimised it already.

1 Like

Oh, just scanned it quickly and didn’t see json3 anywhere.
Interesting, I thought when I tested it, it was mainly on par.
But that was quite a while ago, either it regressed, simdjson got better, or the json used for benchmarking was favoring JSON3.
I don’t see why Julia can’t get the same speed as simdjson, and if I remember correctly, that was one of the goals why @quinnj wrote JSON3.
Btw, I’m just seeing that there is simdjson On-Demand, I’m guessing it’s the same as JSON3’s lazy mode. Are you sure you’re correctly comparing those different modes?

The thing why I would be a bit scared to wrap a C++ library in Julia for JSON parsing is, that a lot of the difficulty comes from performantly constructing the correct language types.
Then again, JSON3 comes with its own array and Dict types, and it works pretty well, so I guess one wouldn’t lose that much, to just create light wrappers around the types that simdjson creates.

1 Like

Update: My implementation isn’t correct as it assumed ordered fields.

The implementation of the json benchmark looks not very favorable
Compare to the native facility of using StructTypes:

using JSON3
using StructTypes

struct Coordinate
    x::Float64
    y::Float64
    z::Float64
end

struct Coordinates
    arr::Vector{Coordinate}
end

StructTypes.StructType(::Type{Coordinate}) = StructTypes.Struct()
StructTypes.StructType(::Type{Coordinates}) = StructTypes.Struct()


function calc(text)
    jobj = JSON3.read(text)
    coordinates = jobj["coordinates"]
    len = length(coordinates)
    x = y = z = 0

    for coord in coordinates
        x += coord["x"]
        y += coord["y"]
        z += coord["z"]
    end

    Coordinate(x / len, y / len, z / len)
end

function calc_struct(text)
    coordinates = JSON3.read(text,Coordinates)
    
    len = length(coordinates.arr)
    x = y = z = 0

    for coord in coordinates.arr
        x += coord.x
        y += coord.y
        z += coord.z
    end

    Coordinate(x / len, y / len, z / len)
end

leads to:

julia> @btime calc(text)
  3.314 ÎĽs (20 allocations: 1.14 KiB)
Coordinate(2.0, 0.5, 0.25)

julia> @btime calc_struct(text)
  297.748 ns (11 allocations: 496 bytes)
Coordinate(2.0, 0.5, 0.25)
2 Likes

It’s certainly a good point, but they check for misordered arguments:

right = Coordinate(2.0, 0.5, 0.25)
for v in [
    """{"coordinates":[{"x":2.0,"y":0.5,"z":0.25}]}""",
    """{"coordinates":[{"y":0.5,"x":2.0,"z":0.25}]}""",
]
    left = calc(v)
    if left != right
        println(stderr, "$(left) != $(right)")
        exit(1)
    end
end

Also, simply ignoring that ordering isn’t valid, since the JSON they’re testing with is a little more complicated and leads to this:

ERROR: LoadError: ArgumentError: invalid JSON at byte position 128 while parsing type Array{Coordinate,1}: ExpectedComma
.592123089136988,
      "name": "shxupy 2607",

Here’s a snipped from the generated JSON they want to parse. They want to make sure that unused fields don’t matter, so there’s no trickery with predetermined types we can/should do.

{
  "coordinates": [
    {
      "x": 0.4086372923486917,
      "y": 0.23240598050870964,
      "z": 0.592123089136988,
      "name": "shxupy 2607",
      "opts": {
        "1": [
          1,
          true
        ]
      }
    },
    {
      "x": 0.8395272223222269,
      "y": 0.4130736387990799,
      "z": 0.7705508546366981,
      "name": "jpkrve 7160",
      "opts": {
        "1": [
          1,
          true
        ]
      }
    },
.
.
.

I’ve setup their benchmarking suite locally, so if you want to test something you can ping me (or run it yourself, it’s pretty straightforward - clone the repo, cd <repo>/json, then do ../analyze.rb make run[test.jl]. Ruby required.).

Thanks, could please benchmark the following snippet?

calc2(text) = funcbarrier(JSON3.read(text).coordinates)
function funcbarrier(coordinates)
    len = length(coordinates)
    x = y = z = 0.0
    
    for coord in coordinates
        x += coord.x::Float64
        y += coord.y::Float64
        z += coord.z::Float64
    end
    Coordinate(x / len, y / len, z / len)
end
Language Time, s Memory, MiB Energy, J
Julia (JSON3 - laborg) 1.082±0.011 340.13±00.04 + 164.59±00.02 0.00±00.00
Julia (JSON3 - base) 1.225±0.012 335.21±00.03 + 165.98±00.03 0.00±00.00

Helps a little bit - I had to get rid of the type annotation because it encountered Int64 along the way.

The culprit seems to be type instabilites:

julia> data = open("/tmp/1.json", "r") do f
        JSON3.read(f).coordinates
        end;

julia> function funcbarrier(coordinates)
           len = length(coordinates)
           x = y = z = 0.0

           for coord in coordinates
               x += coord.x
               y += coord.y
               z += coord.z
           end
           (x / len, y / len, z / len)
       end
funcbarrier (generic function with 1 method)

julia> @code_warntype funcbarrier(data)
Variables
  #self#::Core.Compiler.Const(funcbarrier, false)
  coordinates::JSON3.Array{JSON3.Object,Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}
  len::Int64
  z::Any
  y::Any
  x::Any
  @_7::Union{Nothing, Tuple{JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}},Tuple{Int64,Int64}}}
  coord::JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}

Body::Tuple{Any,Any,Any}
1 ─       (len = Main.length(coordinates))
│   %2  = 0.0::Core.Compiler.Const(0.0, false)
│         (z = %2)
│         (y = %2)
│         (x = %2)
│   %6  = coordinates::JSON3.Array{JSON3.Object,Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}
│         (@_7 = Base.iterate(%6))
│   %8  = (@_7 === nothing)::Bool
│   %9  = Base.not_int(%8)::Bool
└──       goto #4 if not %9
2 ┄ %11 = @_7::Tuple{JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}},Tuple{Int64,Int64}}::Tuple{JSON3.Object{Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}},Tuple{Int64,Int64}}
│         (coord = Core.getfield(%11, 1))
│   %13 = Core.getfield(%11, 2)::Tuple{Int64,Int64}
│   %14 = x::Any
│   %15 = Base.getproperty(coord, :x)::Any
│         (x = %14 + %15)
│   %17 = y::Any
│   %18 = Base.getproperty(coord, :y)::Any
│         (y = %17 + %18)
│   %20 = z::Any
│   %21 = Base.getproperty(coord, :z)::Any
│         (z = %20 + %21)
│         (@_7 = Base.iterate(%6, %13))
│   %24 = (@_7 === nothing)::Bool
│   %25 = Base.not_int(%24)::Bool
└──       goto #4 if not %25
3 ─       goto #2
4 ┄ %28 = (x / len)::Any
│   %29 = (y / len)::Any
│   %30 = (z / len)::Any
│   %31 = Core.tuple(%28, %29, %30)::Tuple{Any,Any,Any}
└──       return %31

The detection of the eltype of coordinates seems to fail. @quinnj, you might be interested in this :slight_smile:

2 Likes

Could you try this version of @laborg’s code from Fastest JSON parser to julia - #6 by laborg?

mutable struct Coordinate
    x::Float64
    y::Float64
    z::Float64
    Coordinate() = new()
end


StructTypes.StructType(::Type{Coordinate}) = StructTypes.Mutable()

That should fix the order issue, maybe at the cost of some performance.

That won’t work, because the data is not “clean” coordinates:

I think we’re not allowed to assume the exact form of the data.

According to the docs, it should be ok to have extra properties:

This flow has the nice properties of: allowing object construction success even if fields are missing in the input, and if “extra” fields exist in the input that aren’t apart of the Julia struct’s fields, they will automatically be ignored.

1 Like

That doesn’t seem to work - the calc call expects to be able to return a Coordinate, with the new() you’ve overwritten the default constructor though…

I’ve taken a look at the data some more and it seems like this is a small version:

{
  "coordinates": [
    {
      "x": 0.4086372923486917,
      "y": 0.23240598050870964,
      "z": 0.592123089136988,
      "name": "shxupy 2607",
      "opts": {
        "1": [
          1,
          true
        ]
      }
    },
    {
      "x": 0.8395272223222269,
      "y": 0.4130736387990799,
      "z": 0.7705508546366981,
      "name": "jpkrve 7160",
      "opts": {
        "1": [
          1,
          true
        ]
      }
    }
  ],
  "info":"some info"
}

If your method doesn’t work on this, it probably won’t work on the bigger data set :slight_smile:

Ah, sorry, it works if you add back the constructor by defining it as

mutable struct Coordinate
    x::Float64
    y::Float64
    z::Float64
    Coordinate() = new()
    Coordinate(x,y,z) = new(x,y,z)
end

Since the struct is mutable now, I had to add a == as well.

I get (ST = StructTypes, FB=Function Barrier):

Language Time, s Memory, MiB Energy, J
Julia (JSON3 - base) 1.225±0.012 335.21±00.03 + 165.98±00.03 0.00±00.00
Julia (JSON3 - ST) 1.146±0.009 339.96±00.12 + 22.39±00.24 0.00±00.00
Julia (JSON3 - ST+FB) 1.145±0.013 340.05±00.04 + 22.63±00.27 0.00±00.00
Julia (JSON3 - FB) 1.082±0.011 340.13±00.04 + 164.59±00.02 0.00±00.00

It’s interesting that the combination of your two techniques is not at all faster than the function barrier alone. Should not be too surprising though, since the FB alone has the immutable struct.

2 Likes