I’m running a somewhat long and slow iteration. I want to incrementally save the result of each iteration in a singular JSON file. The problem is that when the file is closed (because of early termination or because the loop is simply done), and I want to load the content of the JSON file (with JSON.parsefile) the resulting object should be an Array of Dicts. Is there any way of doing that…?
The rational is that in the same way you’d push! the result from the iteration into some Vector you’re collecting the results in (to later do what you will with), I’d like to “push” into a JSON file.
MWE that does not work:
x = rand(10)
open("tmp.json", "a") do io
for i in x
y = sin(i)
JSON.print(io, Dict(:k => i, :v => y), 3)
end
end
You need to actually specify the array in the JSON, so something like:
x = rand(10)
open("tmp.json", "a") do io
print(io, "[")
for i in x
y = sin(i)
JSON.print(io, Dict(:k => i, :v => y), 3)
i == x[end] || print(io, ",")
end
print(io, "]")
end
Thanks! But this won’t work with the append flag for the open, nor would it work if the iteration abruptly stops midway.
I guess there’s no real way of accomplishing this with JSON files…
As for the abrupt termination, you could wrap it in try / finally. But the desire to append makes it trickier. As you say, JSON does not seem like a good choice here. What does the data look like for each iteration? How about a CSV file or other simple format where you just add a new line per iteration?
Currently, it’s a collection of pairs of keys (like a UUID) each pointing at a Matrix{Float64} with 3 columns (x, y, and t) and a variable number of lines (anything from 1 line to 1000 lines). Something like this:
using UUIDs, StructArrays
d = StructArray((k = uuid1(), xyt = rand(rand(1:1000), 3)) for _ in 1:100)
So I guess I could flatten it out by printing each row key, x, y, t and thus “needlessly” repeating the key multiple times…? Hmm…
Like this:
using UUIDs, StructArrays
d = StructArray((k = uuid1(), xyt = rand(rand(1:1000), 3)) for _ in 1:100)
open("tmp.csv", "a") do io
for xyt in 1:100
k = uuid1()
for _ in 1:rand(1:1000)
println(io, join([k; rand(3)], ','))
end
end
end
For appending data, I think it’s cleaner to just append new lines that way. (Or use a proper database / message queue.)
If you really want to use JSON, one option would be to read the existing JSON file into an array in memory at startup, append to this array, and serialize the entire array to disk for each iteration. Sure, it won’t be the most efficient, but if the iterations are slow anyway, it might not matter.
using JSON
xs = rand(10)
open("tmp.json", "a") do io
tmpbuffer = IOBuffer()
try
println(io, '[')
for (i, x) in enumerate(xs)
y = sin(x)
i == 8 && error()
JSON.print(tmpbuffer, Dict(:i => i, :x => x, :v => y), 2)
print(tmpbuffer, ',')
print(io, String(take!(tmpbuffer)))
end
finally
skip(io, -1) # get rid of the last comma which JSON doesn't allow
println(io, ']')
end
end
JSON.parse(read("tmp.json", String))
Should be relatively robust unless your error happens during the JSON.print call. If that is possible (e.g. with interrupts) it’d be best to write that to a temp buffer. Still doesn’t help with file system errors, but eh…
This is the try / finally solution I referred to, but I would recommend against manually generating JSON this way since it’s easy to create bugs and problems for oneself. This code also doesn’t support appending properly (if I understand OP’s needs). I guess you could start by erasing the last closing bracket, but… no, JSON is just not a good fit for this problem (unless serializing the entire data structure for each iteration, which I think is clean, but inefficient).
I tend to agree with this… JSON is a bad format for this kind of work. CSV would also fail if the process got interrupted mid-row, but it’s easier to deal with. Having said that, it would be cool if there was a way to incrementally write data to a file in a safe way. JuliaDB might want to work on that…?