Hi
I want to parse lines of JSON containing among other things a vector of signed ints with some values being “null”, like that :
{"data":[98,null,-51,null]}
The way I do it currently is by using a conversion function like that (I want the data as Float16 to save some memory) :
function conversion(a)
a == nothing ? NaN16 : Float16(a*0.1)
end
nice_vector = conversion.(JSON3.read(JSON_line)[:data])
This works, but this conversion step itself is twice as expensive as the JSON.read (roughly 200 μs for the read operation for a 5000-elements vector).
I process each JSON Lines file like that :
function JSON_Lines_read(filename)
l1 = readline(filename)
nlines = countlines(filename)
npoints = l1 |> JSON3.read |> x->x[:data] |> length
out = Matrix{Float16}(undef, npoints, nlines)
i = 1
for line in eachline(filename)
out[:,i] = conversion.(JSON3.read(line)[:data])
i+=1
end
return out
end
In total it takes 25s to process each of my 500 MB JSON Lines files which desn’t seem that fast.
I sometimes have hundreds of GB of these files to process so it would be nice to be able to read them faster.
I tried naively to do the parsing manually with split and parse operators but it is at least two times slower than this method.
Do you think there is a better way to do that ?