A way to push to a JSON IO like with a collection

I’m running a somewhat long and slow iteration. I want to incrementally save the result of each iteration in a singular JSON file. The problem is that when the file is closed (because of early termination or because the loop is simply done), and I want to load the content of the JSON file (with JSON.parsefile) the resulting object should be an Array of Dicts. Is there any way of doing that…?

The rational is that in the same way you’d push! the result from the iteration into some Vector you’re collecting the results in (to later do what you will with), I’d like to “push” into a JSON file.
MWE that does not work:

x = rand(10)
open("tmp.json", "a") do io
    for i in x
        y = sin(i)
        JSON.print(io, Dict(:k => i, :v => y), 3)
    end
end

You need to actually specify the array in the JSON, so something like:

x = rand(10)
open("tmp.json", "a") do io
    print(io, "[")
    for i in x
        y = sin(i)
        JSON.print(io, Dict(:k => i, :v => y), 3)
       i == x[end] || print(io, ",")
    end
    print(io, "]")
end

Thanks! But this won’t work with the append flag for the open, nor would it work if the iteration abruptly stops midway.
I guess there’s no real way of accomplishing this with JSON files…

It’s a pretty cool thought though: Link an array with a file. Any change to the array is reflected in the file…

In this case the array is a Vector of Dicts and the file is a JSON file. But it could be any collection and any file format…

As for the abrupt termination, you could wrap it in try / finally. But the desire to append makes it trickier. As you say, JSON does not seem like a good choice here. What does the data look like for each iteration? How about a CSV file or other simple format where you just add a new line per iteration?

Currently, it’s a collection of pairs of keys (like a UUID) each pointing at a Matrix{Float64} with 3 columns (x, y, and t) and a variable number of lines (anything from 1 line to 1000 lines). Something like this:

using UUIDs, StructArrays
d = StructArray((k = uuid1(), xyt = rand(rand(1:1000), 3)) for _ in 1:100)

So I guess I could flatten it out by printing each row key, x, y, t and thus “needlessly” repeating the key multiple times…? Hmm…
Like this:

using UUIDs, StructArrays
d = StructArray((k = uuid1(), xyt = rand(rand(1:1000), 3)) for _ in 1:100)
open("tmp.csv", "a") do io
    for xyt in 1:100
        k = uuid1()
        for _ in 1:rand(1:1000)
            println(io, join([k; rand(3)], ','))
        end
    end
end

For appending data, I think it’s cleaner to just append new lines that way. (Or use a proper database / message queue.)

If you really want to use JSON, one option would be to read the existing JSON file into an array in memory at startup, append to this array, and serialize the entire array to disk for each iteration. Sure, it won’t be the most efficient, but if the iterations are slow anyway, it might not matter.

1 Like

That’s true.

I’m afraid I’m not very SQL savy, is there an option like that in say JuliaDB or any of the other DBverses?

How about something like

using JSON
xs = rand(10)
open("tmp.json", "a") do io
    tmpbuffer = IOBuffer()
    try
        println(io, '[')
        for (i, x) in enumerate(xs)
            y = sin(x)
            i == 8 && error()

            JSON.print(tmpbuffer, Dict(:i => i, :x => x, :v => y), 2)
            print(tmpbuffer, ',')

            print(io, String(take!(tmpbuffer)))
        end
    finally
        skip(io, -1) # get rid of the last comma which JSON doesn't allow
        println(io, ']')
    end
end

JSON.parse(read("tmp.json", String))

Should be relatively robust unless your error happens during the JSON.print call. If that is possible (e.g. with interrupts) it’d be best to write that to a temp buffer. Still doesn’t help with file system errors, but eh…

1 Like

This is the try / finally solution I referred to, but I would recommend against manually generating JSON this way since it’s easy to create bugs and problems for oneself. This code also doesn’t support appending properly (if I understand OP’s needs). I guess you could start by erasing the last closing bracket, but… no, JSON is just not a good fit for this problem (unless serializing the entire data structure for each iteration, which I think is clean, but inefficient).

I tend to agree with this… JSON is a bad format for this kind of work. CSV would also fail if the process got interrupted mid-row, but it’s easier to deal with. Having said that, it would be cool if there was a way to incrementally write data to a file in a safe way. JuliaDB might want to work on that…?

1 Like