Parsing arrays directly from text file

I’m currently working on a project where one of the functions generates a large amount of data and returns it as an vector, where each element of the vector is itself a vector with various levels of nesting. I use this data later on for plotting. However, it takes a long time to run the data generating function, and currently it has to run every time I want to create a plot. I would prefer to have the data generating function create all the data and then output it to a text file, which can later be read in by the various graphing functions. This way the data generator only has to run once and the graphing process will be sped up.

Simplified, the code resembles the following:

function datagenerator(filename)
    a = [1.0, 2.0, 3.0, 4.0, 5.0]
    b = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
    c = [[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]]
    examplearray = [a, b, c]
    
    open(filename, "w") do f
        for i ∈ examplearray
            println(f, i)
        end
    end
end

datagenerator("testfile.txt")

This creates the following text file

[1.0, 2.0, 3.0, 4.0, 5.0]
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
[[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]]

This is in the form I want. However, once the data is in text file form, I can’t figure out how to read the vectors back in later on directly as vectors. Instead, they get read back in as string, and I haven’t found any way to parse a string directly as a vector. To continue, I need a function that can take “[1.0, 2.0, 3.0, 4.0, 5.0]” as its input and return [1.0, 2.0, 3.0, 4.0, 5.0].

Thanks in advance!

just store the arrays with JSON for this, if you insist one you own hacky file format, you can use JSON to read them at least:

julia> a
"[1.0, 2.0, 3.0, 4.0, 5.0]"

julia> JSON3.read(a)
5-element JSON3.Array{Int64,Base.CodeUnits{UInt8,String},Array{UInt64,1}}:
 1
 2
 3
 4
 5
1 Like

The Arrow.jl format directly supports writing arrays of arrays (with arbitrary levels of nesting). It’s a pretty efficient binary format, so not a direct text representation, but this allows extremely efficient reading (basically instantaneous, no matter how big the data is). Feel free to check it out; happy to answer any questions.

3 Likes