Parsing Julia code with a large array is very slow or results in `StackOverflowError`

fingolfin · December 7, 2021, 3:55pm

The following is based on a real world case, where I got a huge file with univariate polynomials in them, but I’ve tried to simplify it for easier reproduction. Note that in my original test case, I actually got a ReadOnlyMemoryError instead of a StackOverflowError, but I figured the example here is relevant enough by itself

BTW, I am of course aware that there are “better” ways to serialize univariate polynomials, e.g. as a list of coefficients – but (a) this is the data I get, not my choice, (b) it is less clear what to do when e.g. the coefficients are from special fields where you in turn need special code to serialize them… In short, while long term my “solution” here would be to store this data differently, right now I have to deal with what I have, so there’s no point in telling me to use a different data format.

The following helper function generates the problematic code (originally there was of course just a .jl “data file”):

function badcode(N::Int)
  io = IOBuffer()
  println(io, "x = 2")
  println(io, "ll = [")
  for i in 1:N
    println(io, "[$i, x^7+x^6+x^5+x^4+x+1],")
  end
  println(io, "]")
  return String(take!(io))
end

So for example:

julia> print(badcode(5))
x = 2
ll = [
[1, x^7+x^6+x^5+x^4+x+1],
[2, x^7+x^6+x^5+x^4+x+1],
[3, x^7+x^6+x^5+x^4+x+1],
[4, x^7+x^6+x^5+x^4+x+1],
[5, x^7+x^6+x^5+x^4+x+1],
]

Now let’s try to parse and evaluate this code for different values N – so a Vector{Vector{Int64}} is produced (I played with changing the inner Vector to a tuple, but that made no difference). Storage for that seems to be 16 N based on sizeof.

This is what I see:

julia> bad=badcode(10000) ; @time include_string(Main,bad);
  2.950018 seconds (708.01 k allocations: 42.467 MiB, 0.13% gc time)

julia> bad=badcode(20000) ; @time include_string(Main,bad);
  5.690603 seconds (1.42 M allocations: 84.963 MiB, 0.54% gc time)

julia> bad=badcode(30000) ; @time include_string(Main,bad);
  8.769406 seconds (2.13 M allocations: 127.459 MiB, 0.91% gc time)

julia> bad=badcode(40000) ; @time include_string(Main,bad);
 11.465964 seconds (2.84 M allocations: 169.954 MiB, 0.91% gc time)

julia> bad=badcode(50000) ; @time include_string(Main,bad);
 14.838581 seconds (3.55 M allocations: 212.450 MiB, 0.83% gc time)

julia> bad=badcode(60000) ; @time include_string(Main,bad);
 16.992089 seconds (4.26 M allocations: 254.946 MiB, 1.44% gc time)

julia> bad=badcode(70000) ; @time include_string(Main,bad);
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ none:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] eval(x::Expr)
   @ Base.MainInclude ./client.jl:446
 [4] top-level scope
   @ ./timing.jl:210 [inlined]
 [5] top-level scope
   @ ./REPL[96]:0

Extrapolating from this, it seems that ~71 allocations are required per additional line being read. My guess is this is first building a huge syntax tree and then collapsing it.

Question 1: Why the crash? Is this “normal” or a bug in the Julia parser?

Question 2: Is there any chance this parsing can be sped up in Julia itself?

As it is, Julia is disappointingly slow here. The previous system that was used to parse these files does so in ~250 milliseconds (for a file with ~65000 lines) and evaluates it in another 2.5 seconds.

My guess is that a lot of time is wasted producing machine code that is then avoided. I wonder if one could add an “immediate execution mode” to deal with such cases (I know that other language implementation have such a thing, I’ve even worked on something like that in the past). I know there is a JuliaInterpreter.jl package… so now I wonder

Question 3: Is there another faster way to read this data in Julia, e.g. using JuliaInterpreter.jl or some other parser package

I’ve contemplated just implementing my own parser for simple polynomials with integer coefficients, but perhaps this has already be done by someone?

For now, my solution is a gross hack: I read these files line by line and then pass all the “interesting” lines to eval(Meta.parse(...)). This works, but is still disappointingly small. E.g. with such a file with ~65000 lines, it takes 48 seconds on my computer, about 15 second for parsing and 30 for evaluation. A similar strategy is to modify the file data to look more like this:

x = 2
ll = Vector{Vector{Int}}(undef, 5)
ll[1] = [1, x^7+x^6+x^5+x^4+x+1]
ll[2] = [2, x^7+x^6+x^5+x^4+x+1]
ll[3] = [3, x^7+x^6+x^5+x^4+x+1]
ll[4] = [4, x^7+x^6+x^5+x^4+x+1]
ll[5] = [5, x^7+x^6+x^5+x^4+x+1]

This solves the crash, but is even slower:

julia> function goodcode(N::Int)
         io = IOBuffer()
         println(io, "x = 2")
         println(io, "ll = Vector{Vector{Int}}(undef, $N)")
         for i in 1:N
           println(io, "ll[$i] = [$i, x^7+x^6+x^5+x^4+x+1]")
         end
         return String(take!(io))
       end
goodcode (generic function with 1 method)

julia> str=goodcode(10000) ; @time include_string(Main, str);
  4.311825 seconds (1.08 M allocations: 67.464 MiB, 0.33% gc time)

julia> str=goodcode(20000) ; @time include_string(Main, str);
  8.783035 seconds (2.16 M allocations: 134.985 MiB, 1.44% gc time)

julia> str=goodcode(30000) ; @time include_string(Main, str);
 13.460448 seconds (3.24 M allocations: 202.505 MiB, 1.14% gc time)

julia> str=goodcode(40000) ; @time include_string(Main, str);
 19.910083 seconds (4.32 M allocations: 270.025 MiB, 1.31% gc time)

julia> str=goodcode(50000) ; @time include_string(Main, str);
 20.537545 seconds (5.40 M allocations: 337.545 MiB, 1.24% gc time)

julia> str=goodcode(60000) ; @time include_string(Main, str);
 24.248670 seconds (6.48 M allocations: 405.065 MiB, 1.25% gc time)

julia> str=goodcode(70000) ; @time include_string(Main, str);
 29.532085 seconds (7.56 M allocations: 472.585 MiB, 1.28% gc time)

System info:

julia> versioninfo()
Julia Version 1.6.4
Commit 35f0c911f4 (2021-11-19 03:54 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

Palli · December 7, 2021, 6:14pm

Trying (goodcode) like this:

$ ~/julia-1.7.0/bin/julia -O0 --compile=min

gave only slightly different results, fewer allocations (by a constant amount it seems), and took slightly longer. Tried different opt levels, and do not seem to matter much.

EDIT: Outdated (code), since original post was corrected:

julia> @time eval(Meta.parse(bad));
ERROR: Base.Meta.ParseError("extra token after end of expression")

fingolfin · December 7, 2021, 6:44pm

Sorry I accidentally pasted an old version of the test. The new uses/requires include_string(Main,bad). I’ll edit the post if I can (but am on my phone only right now)

Palli · December 7, 2021, 7:45pm

EDIT: I accidentally ran this with (still possibly useful timing to know, and excluding code-generation then):

$ ~/julia-1.7.0/bin/julia -O0 --compile=min

I simplified your code, to generate a simple array (skipping the polynominal) and it keeps the (roughly) linear allocation pattern, but twice larger file is approx. almost 4x slower (and the factor may be growing). That doesn’t seem right.

julia> bad2=badcode2(10000) ; @time include_string(Main,bad2);
  1.487639 seconds (146.04 k allocations: 5.245 MiB)

julia> bad2=badcode2(20000) ; @time include_string(Main,bad2);
  5.113136 seconds (296.04 k allocations: 10.700 MiB)

julia> bad2=badcode2(30000) ; @time include_string(Main,bad2);
 10.691408 seconds (446.04 k allocations: 15.850 MiB)

julia> bad2=badcode2(40000) ; @time include_string(Main,bad2);
 18.339167 seconds (596.04 k allocations: 21.458 MiB)

julia> bad2=badcode2(50000) ; @time include_string(Main,bad2);
 28.258053 seconds (746.04 k allocations: 26.836 MiB)

..

julia> bad2=badcode2(180000) ; @time include_string(Main,bad2);
346.111683 seconds (2.70 M allocations: 96.760 MiB, 0.02% gc time)

julia> function badcode2(N::Int)
         io = IOBuffer()
         println(io, "x = 2")
         println(io, "ll = [")
         for i in 1:N
           println(io, "[$i],")
         end
         println(io, "]")
         return String(take!(io))
       end

kristoffer.carlsson · December 7, 2021, 7:48pm

If you want to isolate this to the parser you should probably use Meta.parse and not include, which does much more.

Palli · December 7, 2021, 8:01pm

With default options here (see edited post above). I’m just curious but will not look further into. Meta.parse gives an error.

julia> bad2=badcode2(45000) ; @time include_string(Main,bad2);
  2.186840 seconds (448.06 k allocations: 19.371 MiB, 6.98% gc time)

julia> bad2=badcode2(90000) ; @time include_string(Main,bad2);
  3.891472 seconds (898.06 k allocations: 38.769 MiB)

julia> bad2=badcode2(180000) ; @time include_string(Main,bad2);
  7.856997 seconds (1.80 M allocations: 77.564 MiB)

Somewhere around this size I get:

julia> bad2=badcode2(700000) ; @time include_string(Main,bad2);
ERROR: LoadError: syntax: expression too large

and for smaller:

julia> bad2=badcode2(600000) ; @time include_string(Main,bad2);
ERROR: LoadError: StackOverflowError:

fingolfin · December 7, 2021, 8:18pm

I only used include_string to simplify the code for this posting. I actually also used Meta.parse plus eval but it makes no substantial difference; and when used to parse the input line by line, in the end 1/3 of the time is spent in Meta.parse and 2/3 in eval. Namely ~15 resp. ~30 seconds for an example with ~65000 lines of code (oh and the old code for reading these files, in an interpreted language, takes ~250 milliseconds to parse the code and ~2.5 seconds to evaluate it; so it’s a bit difficult right now to sell Julia as a win to some of my colleagues ;-). Although of course it shines in other areas, this example is rather unfair, I’ll admit, but I still unfortunately have to deal with it, and many similar ones )

Palli · December 7, 2021, 8:39pm

You might not need the polynominals as code in the array. You’re generating a lot of code. It might be faster to only have an array of polynominal coefficients, and run evalpoly function (horner-rule) on them at runtime.

I suppose you do not have a constant polynominal, but then this is much faster…

function badcode(N::Int)
         io = IOBuffer()
         println(io, "x = 2; f(x) = x^7+x^6+x^5+x^4+x+1")
         println(io, "ll = [")
         for i in 1:N
           println(io, "[$i, f(x)],")
         end
         println(io, "]")
         return String(take!(io))
       end

Palli · December 7, 2021, 9:15pm

I think you’re misusing Julia doing that. As explained in my preceding comment, there’s a workaround (much faster at parse-time, and likely at runtime too), and I think what you’re doing would be very odd in any other compiled language.

BTW, What’s the other language you’re comparing with? In case of Python, then yes technically compiled, just minimally to interpreted code. I’m not even sure this is the fastest way there, but it may be or my workaround.

You can never say never, people have huge code-bases, such as in C++, but would people likely include functions in an array there?

fingolfin · December 7, 2021, 9:51pm

Thanks, but I think you missed the second paragraph of my posting?

Also, of course in the real data file, every line has a different polynomial.

fingolfin · December 7, 2021, 10:03pm

Also, there is no function in my arrays, there are expressions which get evaluated. Here, for simplicity, I used the same expression on each line, and set x = 2 to highlight that the issue really is in the parsing & evaluating of Julia code. In the real dataset, x is a symbolic polynomial variable. And again: those data files are something I get externally (as I wrote in my original posting), so telling me to use a different format misses the point.

I am also not sure what your point about “compiled languages” here is – parsing speed is independent from execution speed, and a compiled language implementation can have a slow parser or a fast parser, just like an interpreted language implementation can have a slow or a fast parser, and so on.

E.g. doing the equivalent in C (i.e. compiling something like

typedef struct data { int a; int b; } data;
const int x = 2;
const data arr[] = {
{1, x^7+x^6+x^5+x^4+x+1},
{2, x^7+x^6+x^5+x^4+x+1},
{3, x^7+x^6+x^5+x^4+x+1},
{4, x^7+x^6+x^5+x^4+x+1},
{5, x^7+x^6+x^5+x^4+x+1},
};

takes under a second (even in the variant with 65000 lines). Of course there is no jitting here etc., but as I wrote, Julia takes 15 seconds just for the parsing. That strikes me as really slow.

pfitzseb · December 7, 2021, 10:13pm

Julia can parse the code just fine, but fails during eval.
CSTParser.jl is generally faster than Meta.parse, but unfortunately not in this case (neither do very well for big array literals, as you can see). It probably wouldn’t be too hard to get better performance from CSTParser for this specific case (also, I haven’t checked whether the parsing or converting to an Expr is the bottleneck).
It might be possible to use JuliaInterpreter.jl in this case, but I doubt it’s going to help much – parsing and lowering are too slow already (10s/20s on my machine for the 70k lines case).

So overall: Write your own parser for this specific format.

Palli · December 7, 2021, 11:10pm

Yes, misread as anonymous functions, that would have been like something like with this line:

println(io, "$i x->x^7+x^6+x^5+x^4+x+1;")

It might help to rather do a real 2D array, not jagged-array:

function badcode(N::Int)
         io = IOBuffer()
         println(io, "x = 2")
         println(io, "ll = [")
         for i in 1:N
           println(io, "$i x^7+x^6+x^5+x^4+x+1;")
         end
         println(io, "]")
         return String(take!(io))
       end

It has somewhat fewer allocations…

You’re doing about 70 allocations per line, indirectly, and allocating about 4249 bytes per line (and 59 bytes on average per allocation).

With a better string type in Base, I’m sure some of these allocations could be eliminated (rather stack-allocated).

fingolfin · December 8, 2021, 2:52pm

Could you clarify what you mean here? “Fail” as in “is slower than it should be”?

I may have to… But in the meantime, I wonder if the Julia parser really is this slow, won’t it also cause issue elsewhere down the road? For reference I tried equivalent code (with 50000 lines) in various other languages (C, Python, and GAP – a CAS for computational group theory, where many of these data files come from), and they all parsed it in under a second (while Julia takes > 10 seconds). For reference, I generated their code as follows:

function badcode_c(N::Int)
  io = IOBuffer()
  print(io, """
  typedef struct data { int a; int b; } data;
  const int x = 2;
  const data arr[] = {
  """)
  for i in 1:N
    println(io, "{$i, x^7+x^6+x^5+x^4+x+1},")
  end
  println(io, "};")
  return String(take!(io))
end

function badcode_py(N::Int)
  io = IOBuffer()
  print(io, """
  x = 2
  arr = [
  """)
  for i in 1:N
    println(io, "[$i, x^7+x^6+x^5+x^4+x+1],")
  end
  println(io, "]")
  return String(take!(io))
end

pfitzseb · December 8, 2021, 3:03pm

Fail as in “throws that StackOverflowError”.

Possibly? Parsing all of Base takes > 20s, but I don’t think anyone has really complained about that so far.

cchderrick · December 8, 2021, 5:35pm

Not an expert in Python, but I don’t think ^ is power?
If your data already comes in language specific syntax, I find it hard to compare it across different languages without parsing it and rewriting it for other syntax, like replacing ^ with **.

Not as answer to your question, but here is the Julia interoperability that I really like.

using PyCall
function python_code(N::Int)
  io = IOBuffer()
  print(io, "[")
  for i in 1:N
    print(io, "[$i, x**7+x**6+x**5+x**4+x+1],")
  end
  print(io, "]")
  return String(take!(io))
end
@time pyeval(python_code(10_000), x=2)

julia> @time pyeval(python_code(10_000), x=2)
  0.158671 seconds (149.03 k allocations: 5.674 MiB)
10000×2 Matrix{Int64}:
     1  243
     2  243
     3  243
     4  243

Or MATLAB

using MATLAB
function matlab_code(N::Int)
  io = IOBuffer()
  println(io, "x = 2;")
  println(io, "ll = [")
  for i in 1:N
    println(io, "$i, x^7+x^6+x^5+x^4+x+1;")
  end
  println(io, "]")
  return String(take!(io))
end
@time MATLAB.eval_string(matlab_code(10_000))

First time including starting MATLAB took ~3 seconds, second time is this

        2619         243
        2620         243
        2621         243
      0.314543 seconds (59.51 k allocations: 3.304 MiB)

Palli · December 8, 2021, 6:35pm

It is faster (usually), but I tried scalability and for me at least at one point got way longer time:

julia> @time pyeval(python_code(70_000), x=2);
  1.703727 seconds (1.05 M allocations: 38.763 MiB, 1.63% gc time)

strange:

julia> @time pyeval(python_code(170_000), x=2);
 27.944659 seconds (2.55 M allocations: 97.006 MiB, 0.19% gc time)

julia> @time pyeval(python_code(170_000), x=2);
  4.143577 seconds (2.55 M allocations: 97.006 MiB, 3.80% gc time)

julia> @time pyeval(python_code(270_000), x=2);
  6.458355 seconds (4.05 M allocations: 150.430 MiB, 1.28% gc time)

julia> @time pyeval(python_code(2_200_000), x=2);
 52.249545 seconds (33.68 M allocations: 1.254 GiB, 2.63% gc time, 0.55% compilation time)

julia> @time pyeval(python_code(2_500_000), x=2);
Killed

blackeneth · December 8, 2021, 9:58pm

Have you considered parsing out the coefficients from the polynomials as you read them in and building a matrix of the coeffficients

Say you had polynomials

 x^7 + x^6 + x^5 + x^4 + x + 1
0.5x^7 + x^5 + x^4 + x^3 + 0.5x^2 + x + 3

Build the matrix

julia> pa=[1 1 1 1 0 0 1 1;0.5 0 1 1 1 0.5 1 3]
2×8 Matrix{Float64}:
 1.0  1.0  1.0  1.0  0.0  0.0  1.0  1.0
 0.5  0.0  1.0  1.0  1.0  0.5  1.0  3.0

Then create a vector of your x to the powers:

ulia> px = [x^7;x^6;x^5;x^4;x^3;x^2;x;1]
8-element Vector{Int64}:
 128
  64
  32
  16
   8
   4
   2
   1

Then multiply to evaluate:

julia> pa*px
2-element Vector{Float64}:
 243.0
 127.0

That should be way faster.

Palli · December 8, 2021, 10:44pm

I tried and with equivalent evalpoly is faster:

julia> function badcode(N::Int)
                io = IOBuffer()
                println(io, "x = 2;")
                println(io, "ll = [")
                for i in 1:N
                  println(io, "[$i, evalpoly(x, (1, 1, 0, 0, 1, 1, 1, 1))],")
                end
                println(io, "]")
                return String(take!(io))
              end

and more scalable, still with a limit (as is natural):

julia> bad=badcode(10000) ; @time include_string(Main,bad);
  2.001641 seconds (248.25 k allocations: 13.526 MiB)

julia> bad=badcode(100000) ; @time include_string(Main,bad);
 20.156537 seconds (2.50 M allocations: 135.405 MiB, 0.80% gc time)

julia> bad=badcode(1000000) ; @time include_string(Main,bad);
ERROR: LoadError: syntax: expression too large

but trying to find where/how high I can safely go, I get:

julia> bad=badcode(300000) ; @time include_string(Main,bad);
ERROR: LoadError: StackOverflowError:
...

julia> bad=badcode(500000) ; @time include_string(Main,bad);

signal (11): Segmentation fault
in expression starting at string:2
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:720
...

fingolfin · December 8, 2021, 11:06pm

Thanks, but these recent answers all miss the point: I do not control the input format. The only reason I provided code to produce sample input files is that this much easier to scale and adapt then pasting here a couple files with 65000 lines each.

Topic		Replies	Views
CAS benchmarks (Symbolics.jl and Maxima) Performance symbolic	36	9373	April 5, 2021
Parsing expressions into functions Performance	22	1974	June 11, 2019
Porting a CAS to Julia General Usage	21	3787	April 24, 2018
CSV Reading (rewrite in C?) Internals & Design	50	5212	October 1, 2018
Why is this Julia code considerably slower than Matlab New to Julia performance	64	8622	March 5, 2017

Parsing Julia code with a large array is very slow or results in `StackOverflowError`

Related topics