Julia: compiling strings

paulobuchsbaum · May 7, 2018, 5:13pm

I’m writing a stochastic simulation sofware in Julia.

I need to store expressions in strings, because I want to save them in a spreadsheet or database, due to the ease of change, without having to directly change source files in Julia.

The normal solution is Eval, but this feature it’s painfully slow.

A test that I did using time() function with eval() in my real code was around 900x slower than native user defined functions.

It would be great a way to compile strings (more than parse function does) in order to generate a compiled thunk (anonymous function)

Below I’ve coded a very simple demo for comparing eval() and anonymous function

const iter = 1000000
const niter = 10000

function compiled()  # use anonymous function
 local x::Float64 = 45
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Function = () -> sind(x)
 for i = 1:iter
   res = f()
 end
 println("Compiled => Elapsed Time: ", 
     @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

y = 45.
function evaluated() # uses eval

 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Expr = parse("sind(y)")
 for i = 1:niter
   res = eval(f)
 end
 println("Evaluated => Elapsed Time: ",
    @sprintf("%0.3f",(time()-ini)/niter * 1000000), " microseconds")
end

compiled()  
evaluated()

The results are amazing. 2950x faster!

Compiled => Elapsed Time: 0.020 microseconds
Evaluated => Elapsed Time: 59.000 microseconds

My syntax dream: In compiled() function replace

local f::Function = y -> sind(y)

by

local f::Function = compile("sind(y)")

Is there a hidden trick that I don’t get it?

rdeits · May 7, 2018, 5:31pm

There are some major disadvantages to using eval() in this way, beyond performance, as it operates at global scope (even in a function) and is extremely insecure. But if you must use eval() in this way, then at least don’t do it in a loop. You’re creating a brand-new function which must be recompiled at every single iteration of your loop, which is hugely expensive. Instead, you can create the function definition once and use it multiple times:

julia> code = "y -> sin(y + 10) / y"
"y -> sin(y + 10) / y"

julia> f = eval(parse(code))
(::#1) (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark $(f)(π)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.151 ns (0.00% GC)
  median time:      14.086 ns (0.00% GC)
  mean time:        14.236 ns (0.00% GC)
  maximum time:     35.443 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

But maybe you could explain more about why you need to store code as strings in the first place?

Also, just FYI, the local declarations and type annotations are not necessary at all. If they help you organize your own code, that’s fine, but they won’t affect its performance or correctness. Also, const inside a function body is a no-op (in julia v0.6) and is likewise unnecessary.

paulobuchsbaum · May 7, 2018, 5:59pm

Very thanks for your feedback!

I need strings to store expressions because there will be a Excel front-ent to store the equations and constraints in cells inside a worksheet. It’s very handy for the final user.

It would be ackward if I needed to develop a software to record a Julia include source package for simulate this situation. Even in this situation, this strategy is even worse because include statement don’t allow local variables, so I need to use global variables, that is inefficient.

Once I use strings like expressions, I need to eval the parsed strings in each loop iteration because, like any simulation software, I need to evoke random number generation in each loop iteration, that changes the eval calculations.

By the way, I use local type annotations just for clarity. I did not know about const local declarations. I wll change this. Thanks for that tip.

For me will be amazing.a feature to compile string expressions to produce a anonymous compiled expression. I could avoid eval forever…

rdeits · May 7, 2018, 6:13pm

For me will be amazing.a feature to compile string expressions to produce a anonymous compiled expression.

This sounds like exactly what f = eval(parse("y -> sin(y + 10)")) does. It evaluates the definition of an anonymous function and returns a Julia function that you can call with different values of y. You simply need to change what your strings store.

For example, instead of:

code = "sin(d)"
parsed = parse(code)
for i in 1:10 
  d = randn()
  eval(parsed)
end

you can do:

code = "d -> sin(d)"
f = eval(parse(code))
for i in 1:10 
  d = randn()
  f(d)
end

rdeits · May 7, 2018, 6:14pm

And, if you have some convention for what your input variables are named, you can of course transform "sin(d)" into "d -> sin(d)" automatically.

ExpandingMan · May 7, 2018, 6:23pm

Also it’s worth considering whether your functions are members of a parametric class of functions. For example, \sin(\phi), \cos(\phi), \sinh(\phi) and \cosh(\phi) all belong to a 2-parameter family of functions.

Also, if you are going through all this trouble anyway, why not just create some macros to make it easy for users to generate their own code in a normal source code (*.jl) file?

mbauman · May 7, 2018, 6:35pm

I’m moving this to #usage so we can focus on how to do things with the status quo first — and then decide if we need a language extension or not.

paulobuchsbaum · May 8, 2018, 1:21am

@rdeits, great tip! I didn’t know that kind of eval power. I’m new in Julia, I was using Python, that has very slow running time comparing with Julia.

In fact, when I run eval outside loop to produce a anonymous function, everything works like a charm. Thanks for sharing your experience. It much faster than previous solutions.

To encompass all my needs, I generate a function that changes a value from a global variable.

To create an anonymous function from string:

s = "() -> global x = rnd()"
func = eval(parse(s))

To uses that function:
func()

So it’s possible to read this in a Excel cell, process and read its contents in a variable.

However there is one important remaining issue:

The speed (even with a great improve from previous situation)

Eval only works in global scope, that is slower than local scope. Besides, eval(parse(code)) use, even out of main loop, is much slower than direct anonymous function.

Using direct anonymous function stored in array (to be fair) is around 8.6x faster than eval use.

I don’t know why this happens. Maybe the eval(parse(string)) don’t generate real machine code like a direct function. Maybe this code uses LLVM format.

@ExpandingMan
Always it’s possible change source code programmatically, but it’s not very practical. (I could use a source file as an include). It’s more intuitive and easy read cells from Excel file, change the string to make some sintaxes fixes and compile (eval(parse(s))

However, maybe it’s better to do it, because the speed advantage is still big.

ScottPJones · May 8, 2018, 2:25am

How many different global variables are the functions changing?
Why do they need to be globals?
Instead, could you maybe pass a mutable struct to the functions?
i.e. (str) -> (str.x = rnd())
Also, instead of always compiling the strings, use a Dict to point to the functions.

mutable struct MyStruct ; x::Float64 ; str::String ; z::Complex{Float64} ; end
const funtab = Dict{String, Function}()
const mystr = MyStruct(1.0, "foo", 5im)
...
# When you read the Excel files, you do:
funtab[str] = eval(parse(str))
# and to call the function, then simply:
funtab[str](mystr)

paulobuchsbaum · May 8, 2018, 2:53am

Inspired in @rdeits, I’ve redone the initial tests improved, using type annotation in formal parameter part:

y = 0; z = 0
const iter = 1000000
const niter = 10000
fu = eval(parse("(z) -> exp(z)"))  # Type annotation here is irrelevant

function compiled()
 local x::Float64 = 45
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Function = (x::Float64) -> exp(x) # Type annotation for local variables is irrelevant
 for i = 1:iter
   res = f(rand())  # f()
 end
 println("Compiled => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
 # println(x," . ",res);
end

function evalOutLoop()
 global z
 global fu
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 for i = 1:iter
   res = fu(rand())
 end
 println("EvalOutLoop => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

function evalInLoop()
 global y
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Expr = parse("exp(y)")
 for i = 1:niter
   y = rand()
   res = eval(f)
 end
 println("EvalInLoop => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/niter * 1000000), " microseconds")
end

compiled()
evalOutLoop()
evalInLoop()

The results:

Compiled => Elapsed Time: 0.019 microseconds
EvalOutLoop => Elapsed Time: 0.330 microseconds (18 x slower)
EvalInLoop => Elapsed Time: 61.400 microseconds (3200 x slower)

Addendum

lastly, as a bonus, testing over and over, I’ve learned some stuff

Type annotations inside function has an additional safefy utility. It avoids that one, by mistake, change variable type, but, in case of local variables, does not improve the speed.

function typeLoc()
  ta::float64 = 2.32   # If one take out the type, no error.
  ta = "baba"  # Error
end

I also notice that eval that create a anonymous function need to be stored in a global variable.
For global variables, type annotation in the right side improves the speed a lot. My experience shows 4x more speed.

ini = time()
for i = 1:iter
   ts = rand()
   ts = ts::Float64^2  
end
println( Elapsed Time: ", @sprintf("%0.5f",
      (time()-ini)/iter * 1000000), " microseconds")

Elapsed Time: 0.04100 microseconds
Elapsed Time: 0.17300 microseconds

paulobuchsbaum · May 8, 2018, 3:08am

Hi, @ScottPJones, thanks for your suggestions. I will study dict use option.

My idea is make a stochastic simulation program. There are 1 to many input variables in simulation, so each expression changes just one variable, the corresponding input variable. Also there are 1 to many output variables in simulation, so it’s the same. Each output expression changes just one variable and depends on 1 or more input variables.

In short, I need to change just one variable in each function.

It’s necessary that the variable in the expressions be globals because

1) The expresssion parse(eval(string)) just accept global variables, because it runs in module scope.
2) If I return the function to the main code and assign the result to a local variable, it cannot used in other function executions (eval generated) that only see global variables.
3) If I return the function to the main code and assign the result to a global variable, it’s complicated because only the string data in Excel know the variables names. So the core code cannot be mixed with stuff that the code does know.

ScottPJones · May 8, 2018, 5:08am

But if you simply pass in a structure that contains the values, you don’t need to use any global variables, there is nothing that needs to use global scope.

The dictionary that holds the functions would be global, as would the mutable struct that holds the “variables”,
so both could be seen by other functions.

I don’t quite understand what you are trying to say here.
You wouldn’t be using global variable names in the string data in Excel.
Instead of global foo, you simply have var.foo (where `var would be the mutable structure containing all of the state visible to the string functions store in the Excel file.

paulobuchsbaum · May 8, 2018, 1:30pm

Using dict with parse(eval(string)) is an interesting alternative and I’ve tried it, however is it is much slower than using parse(eval(string)) with global variables.

dic = Dict("x"=> 0., "y"=> 0., "z"=> 0.) # global dicionary
x=0.0; y=0;0; z=0;0                      # global variables
i = 0

vDicExpr = ["""dic["x"] = rand()""", """dic["y"] = rand()""" ,
             """dic["z"]=2*dic["x"]+3*dic["y"]"""]
vDicCod = string.("() -> " ,vDicExpr)
vDicFun = Array{Function}(length(vDicCod)) 
for i in 1:length(vDicCod)
   try
     vDicFun[i] = eval(parse(vDicCod[i]))
   catch
     error(vDicCod[i])
   end
end

vExprExpr = ["x=rand()", "y=rand()" , "z=2*x + 3*y"]
vExprCod = string.("() -> global " ,vExprExpr)
vExprFun = Array{Function}(length(vExprCod)) 
for i in 1:length(vExprCod)
   try
     vExprFun[i] = eval(parse(vExprCod[i]))
   catch
     error(vExprCod[i])
   end
end

const iter = 1000000

function evalExpr(nome, fu)
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 for i = 1:iter
   fu[1]()
   fu[2]()
   fu[3]()
 end
 println(nome ," => Elapsed Time: ",
   @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

evalExpr("Global",vExprFun)
evalExpr("Dictionary",vDicFun)

is a relevant difference for my because simulations software are loops with random stochastic generation inside that runs billions of times.

Global => Elapsed Time: 0.240 microseconds
Dictionary => Elapsed Time: 0.639 microseconds (~ 2,7x slower)

Topic		Replies	Views
Generate functions inside a function from string Performance question	35	4010	July 13, 2018
Parsing expressions into functions Performance	22	1903	June 11, 2019
How to avoid evaluation into the global space when the expression is not know in advance General Usage strings , eval	31	1524	August 30, 2021
Pass expression to manipulate local variable New to Julia question , metaprogramming	22	1906	April 20, 2022
Eval a string with runtime defined functions General Usage	14	663	March 8, 2024

Julia: compiling strings

Addendum

Related topics