Parsing a listing of non-rectangular array

I would like to parse strings like this "[[[1], [2]], [[2, 3], [4, 5]], [[6, 7, 8], [3, 1, 4]]]" and get the underlying array back. Is there an off-the-shelf function to do it?

I found it:

eval(Meta.parse("[[[1], [2]], [[2, 3], [4, 5]], [[6, 7, 8], [3, 1, 4]]]"))

That does work, but use of eval can be slow, unsafe, and very often a sign that there is a better solution to whatever the real problem is. Perhaps you can tell us more about what you’re trying to do. Where do the arrays come from? Are they produced by other Julia code?

1 Like

@rdeits The arrays are read from text file. They are 3 dimensional, Nx2xM shape, non-rectangular (the last dimension has different lengths).

I guess my question was more along the lines of: do you control the production of these text files too? If so, it might make your life easier to use a well-defined format like JSON, BSON, or JLD2 to store the data so that you can avoid eval and parse.

Actually, your data looks like it could be parsed as JSON:

(v1.0) pkg> add JSON
 Resolving package versions...
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]

julia> using JSON

julia> JSON.parse("[[[1], [2]], [[2, 3], [4, 5]], [[6, 7, 8], [3, 1, 4]]]")
3-element Array{Any,1}:
 Any[Any[1], Any[2]]            
 Any[Any[2, 3], Any[4, 5]]      
 Any[Any[6, 7, 8], Any[3, 1, 4]]

which is (a) about 100 times faster and (b) doesn’t create the substantial security and stability risk of eval():

julia> using BenchmarkTools

julia> s = "[[[1], [2]], [[2, 3], [4, 5]], [[6, 7, 8], [3, 1, 4]]]"
"[[[1], [2]], [[2, 3], [4, 5]], [[6, 7, 8], [3, 1, 4]]]"

julia> @btime JSON.parse($s)
  4.532 μs (45 allocations: 2.59 KiB)
3-element Array{Any,1}:
 Any[Any[1], Any[2]]            
 Any[Any[2, 3], Any[4, 5]]      
 Any[Any[6, 7, 8], Any[3, 1, 4]]

julia> @btime eval(Meta.parse($s))
  318.905 μs (106 allocations: 7.22 KiB)
3-element Array{Array{Array{Int64,1},1},1}:
 [[1], [2]]            
 [[2, 3], [4, 5]]      
 [[6, 7, 8], [3, 1, 4]]

One difference is that JSON.parse (intentionally) produces Array{Any} containers, but it’s pretty straightforward to narrow those array types if you want:

julia> narrow(x::AbstractArray) = collect(narrow.(x))
narrow (generic function with 1 method)

julia> narrow(x) = x
narrow (generic function with 2 methods)

julia> narrow(JSON.parse(s))
3-element Array{Array{Array{Int64,1},1},1}:
 [[1], [2]]            
 [[2, 3], [4, 5]]      
 [[6, 7, 8], [3, 1, 4]]
5 Likes

Thank you. JSON.parse solves my problem. The data comes from 3rd party, so I’m stuck with their format.