I have JSON files that contain a list of square matrices. Let’s say each JSON has a single key "mats"
which is a list of list of lists. Each element in the outer list—i.e. a list of lists—is a matrix in row major order. That is, each list of the outer list of lists of lists is a row of a matrix.
(Is this a good way to pass matrices around? Probably not. But it’s out of my control for now.)
Typically there are ~100 matrices of size ~1000x1000. Each matrix is square and of the same size. The matrices have real valued elements, either 0 or a float.
I’d like to parse the JSON file and transform to a list of regular Julia matrices, and I’m concerned with performance both in time and memory usage. Some strategies I’ve tried and associated problems:
-
Use JSON.jl to parse. Problem: JSON parsing could be faster and there seems to be some issue with JSON not freeing memory. Also parsed arrays have type
Vector{Any}
which doesn’t seem ideal for performance. -
Use JSON3.jl to parse. JSON3 parsing is fast and memory efficient, but, problem: natural patterns for accessing the parsed nested
JSON3.Array
elements can be prohibitively slow, hundreds of times slower than accessing the parsedArray{Any}
returned by JSON.jl parsing. Github issue. I’m not sure if this is a bug or expected, but either way seems like this strategy ends up quite slow. For example, compare converting a list-of-lists parsed wiith JSON or JSON3:
mat(arrs) = [arrs[i][j] for i in 1:length(arrs), j in 1:length(arrs)]
julia> arrs_json[1:2]
2-element Array{Any,1}:
Any[0.2727272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090909090909091, 0.0 … 0.0, 0.09090909090909091, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Any[0.0, 0.23529411764705882, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia> typeof(arrs_json)
Array{Any,1}
julia> @belapsed mat($arrs_json)
0.054457167
julia> arrs_j3[1:2]
2-element Array{JSON3.Array,1}:
[0.2727272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090909090909091, 0.0 … 0.0, 0.09090909090909091, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Union{Float64, Int64}[0, 0.23529411764705882, 0, 0, 0, 0, 0, 0, 0, 0 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
julia> typeof(arrs_j3)
JSON3.Array{JSON3.Array,Base.CodeUnits{UInt8,String},SubArray{UInt64,1,Array{UInt64,1},Tuple{UnitRange{Int64}},true}}
julia> @belapsed mat($arrs_j3)
9.723495698 # 180x slower
- Use JSON3.jl to parse, then convert to standard Julia arrays with
copy
. Problem: extra memory and time overhead from an extra copy. Also, the standard Julia arrays returned bycopy
ing the JSON3 parsed object perform better thanJSON3.Array
, but actually much worse thanVector{Any}
. I’m not sure why the extra type information hurts performance, but here we are:
julia> arrs_j3_copy = copy(arrs_j3);
julia> arrs_j3_copy[1:2]
2-element Array{Array{T,1} where T,1}:
[0.2727272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09090909090909091, 0.0 … 0.0, 0.09090909090909091, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Real[0, 0.23529411764705882, 0, 0, 0, 0, 0, 0, 0, 0 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
julia> typeof(arrs_j3_copy)
Array{Array{T,1} where T,1}
julia> @belapsed mat($arrs_j3_copy)
0.081806462. # 1.6x slower than arrs_json
So any tips would be much appreciated - thanks!