Julia: compiling strings

I’m writing a stochastic simulation sofware in Julia.

I need to store expressions in strings, because I want to save them in a spreadsheet or database, due to the ease of change, without having to directly change source files in Julia.

The normal solution is Eval, but this feature it’s painfully slow.

A test that I did using time() function with eval() in my real code was around 900x slower than native user defined functions.

It would be great a way to compile strings (more than parse function does) in order to generate a compiled thunk (anonymous function)

Below I’ve coded a very simple demo for comparing eval() and anonymous function

const iter = 1000000
const niter = 10000

function compiled()  # use anonymous function
 local x::Float64 = 45
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Function = () -> sind(x)
 for i = 1:iter
   res = f()
 end
 println("Compiled => Elapsed Time: ", 
     @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

y = 45.
function evaluated() # uses eval

 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Expr = parse("sind(y)")
 for i = 1:niter
   res = eval(f)
 end
 println("Evaluated => Elapsed Time: ",
    @sprintf("%0.3f",(time()-ini)/niter * 1000000), " microseconds")
end

compiled()  
evaluated()

The results are amazing. 2950x faster!

Compiled => Elapsed Time: 0.020 microseconds
Evaluated => Elapsed Time: 59.000 microseconds

My syntax dream: In compiled() function replace

local f::Function = y -> sind(y)

               by

local f::Function = compile("sind(y)")

Is there a hidden trick that I don’t get it?

There are some major disadvantages to using eval() in this way, beyond performance, as it operates at global scope (even in a function) and is extremely insecure. But if you must use eval() in this way, then at least don’t do it in a loop. You’re creating a brand-new function which must be recompiled at every single iteration of your loop, which is hugely expensive. Instead, you can create the function definition once and use it multiple times:

julia> code = "y -> sin(y + 10) / y"
"y -> sin(y + 10) / y"

julia> f = eval(parse(code))
(::#1) (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark $(f)(π)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.151 ns (0.00% GC)
  median time:      14.086 ns (0.00% GC)
  mean time:        14.236 ns (0.00% GC)
  maximum time:     35.443 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

But maybe you could explain more about why you need to store code as strings in the first place?

Also, just FYI, the local declarations and type annotations are not necessary at all. If they help you organize your own code, that’s fine, but they won’t affect its performance or correctness. Also, const inside a function body is a no-op (in julia v0.6) and is likewise unnecessary.

Very thanks for your feedback!

I need strings to store expressions because there will be a Excel front-ent to store the equations and constraints in cells inside a worksheet. It’s very handy for the final user.

It would be ackward if I needed to develop a software to record a Julia include source package for simulate this situation. Even in this situation, this strategy is even worse because include statement don’t allow local variables, so I need to use global variables, that is inefficient.

Once I use strings like expressions, I need to eval the parsed strings in each loop iteration because, like any simulation software, I need to evoke random number generation in each loop iteration, that changes the eval calculations.

By the way, I use local type annotations just for clarity. I did not know about const local declarations. I wll change this. Thanks for that tip.

For me will be amazing.a feature to compile string expressions to produce a anonymous compiled expression. I could avoid eval forever…

For me will be amazing.a feature to compile string expressions to produce a anonymous compiled expression.

This sounds like exactly what f = eval(parse("y -> sin(y + 10)")) does. It evaluates the definition of an anonymous function and returns a Julia function that you can call with different values of y. You simply need to change what your strings store.

For example, instead of:

code = "sin(d)"
parsed = parse(code)
for i in 1:10 
  d = randn()
  eval(parsed)
end

you can do:

code = "d -> sin(d)"
f = eval(parse(code))
for i in 1:10 
  d = randn()
  f(d)
end

And, if you have some convention for what your input variables are named, you can of course transform "sin(d)" into "d -> sin(d)" automatically.

Also it’s worth considering whether your functions are members of a parametric class of functions. For example, \sin(\phi), \cos(\phi), \sinh(\phi) and \cosh(\phi) all belong to a 2-parameter family of functions.

Also, if you are going through all this trouble anyway, why not just create some macros to make it easy for users to generate their own code in a normal source code (*.jl) file?

I’m moving this to #usage so we can focus on how to do things with the status quo first — and then decide if we need a language extension or not.

1 Like

@rdeits, great tip! I didn’t know that kind of eval power. I’m new in Julia, I was using Python, that has very slow running time comparing with Julia.

In fact, when I run eval outside loop to produce a anonymous function, everything works like a charm. Thanks for sharing your experience. It much faster than previous solutions.

To encompass all my needs, I generate a function that changes a value from a global variable.

To create an anonymous function from string:

s = "() -> global x = rnd()"
func = eval(parse(s))

To uses that function:
func()

So it’s possible to read this in a Excel cell, process and read its contents in a variable.

However there is one important remaining issue:

The speed (even with a great improve from previous situation)

Eval only works in global scope, that is slower than local scope. Besides, eval(parse(code)) use, even out of main loop, is much slower than direct anonymous function.

Using direct anonymous function stored in array (to be fair) is around 8.6x faster than eval use.

I don’t know why this happens. Maybe the eval(parse(string)) don’t generate real machine code like a direct function. Maybe this code uses LLVM format.

@ExpandingMan
Always it’s possible change source code programmatically, but it’s not very practical. (I could use a source file as an include). It’s more intuitive and easy read cells from Excel file, change the string to make some sintaxes fixes and compile (eval(parse(s))

However, maybe it’s better to do it, because the speed advantage is still big.

How many different global variables are the functions changing?
Why do they need to be globals?
Instead, could you maybe pass a mutable struct to the functions?
i.e. (str) -> (str.x = rnd())
Also, instead of always compiling the strings, use a Dict to point to the functions.

mutable struct MyStruct ; x::Float64 ; str::String ; z::Complex{Float64} ; end
const funtab = Dict{String, Function}()
const mystr = MyStruct(1.0, "foo", 5im)
...
# When you read the Excel files, you do:
funtab[str] = eval(parse(str))
# and to call the function, then simply:
funtab[str](mystr)

Inspired in @rdeits, I’ve redone the initial tests improved, using type annotation in formal parameter part:

y = 0; z = 0
const iter = 1000000
const niter = 10000
fu = eval(parse("(z) -> exp(z)"))  # Type annotation here is irrelevant

function compiled()
 local x::Float64 = 45
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Function = (x::Float64) -> exp(x) # Type annotation for local variables is irrelevant
 for i = 1:iter
   res = f(rand())  # f()
 end
 println("Compiled => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
 # println(x," . ",res);
end

function evalOutLoop()
 global z
 global fu
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 for i = 1:iter
   res = fu(rand())
 end
 println("EvalOutLoop => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

function evalInLoop()
 global y
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 local f::Expr = parse("exp(y)")
 for i = 1:niter
   y = rand()
   res = eval(f)
 end
 println("EvalInLoop => Elapsed Time: ", @sprintf("%0.3f",(time()-ini)/niter * 1000000), " microseconds")
end

compiled()
evalOutLoop()
evalInLoop()

The results:

Compiled => Elapsed Time: 0.019 microseconds
EvalOutLoop => Elapsed Time: 0.330 microseconds (18 x slower)
EvalInLoop => Elapsed Time: 61.400 microseconds (3200 x slower)

Addendum

lastly, as a bonus, testing over and over, I’ve learned some stuff

  • Type annotations inside function has an additional safefy utility. It avoids that one, by mistake, change variable type, but, in case of local variables, does not improve the speed.
function typeLoc()
  ta::float64 = 2.32   # If one take out the type, no error.
  ta = "baba"  # Error
end
  • I also notice that eval that create a anonymous function need to be stored in a global variable.

  • For global variables, type annotation in the right side improves the speed a lot. My experience shows 4x more speed.

ini = time()
for i = 1:iter
   ts = rand()
   ts = ts::Float64^2  
end
println( Elapsed Time: ", @sprintf("%0.5f",
      (time()-ini)/iter * 1000000), " microseconds")

Elapsed Time: 0.04100 microseconds
Elapsed Time: 0.17300 microseconds

Hi, @ScottPJones, thanks for your suggestions. I will study dict use option.

My idea is make a stochastic simulation program. There are 1 to many input variables in simulation, so each expression changes just one variable, the corresponding input variable. Also there are 1 to many output variables in simulation, so it’s the same. Each output expression changes just one variable and depends on 1 or more input variables.

In short, I need to change just one variable in each function.

It’s necessary that the variable in the expressions be globals because

1) The expresssion parse(eval(string)) just accept global variables, because it runs in module scope.
2) If I return the function to the main code and assign the result to a local variable, it cannot used in other function executions (eval generated) that only see global variables.
3) If I return the function to the main code and assign the result to a global variable, it’s complicated because only the string data in Excel know the variables names. So the core code cannot be mixed with stuff that the code does know.

But if you simply pass in a structure that contains the values, you don’t need to use any global variables, there is nothing that needs to use global scope.

The dictionary that holds the functions would be global, as would the mutable struct that holds the “variables”,
so both could be seen by other functions.

I don’t quite understand what you are trying to say here.
You wouldn’t be using global variable names in the string data in Excel.
Instead of global foo, you simply have var.foo (where `var would be the mutable structure containing all of the state visible to the string functions store in the Excel file.

Using dict with parse(eval(string)) is an interesting alternative and I’ve tried it, however is it is much slower than using parse(eval(string)) with global variables.

dic = Dict("x"=> 0., "y"=> 0., "z"=> 0.) # global dicionary
x=0.0; y=0;0; z=0;0                      # global variables
i = 0

vDicExpr = ["""dic["x"] = rand()""", """dic["y"] = rand()""" ,
             """dic["z"]=2*dic["x"]+3*dic["y"]"""]
vDicCod = string.("() -> " ,vDicExpr)
vDicFun = Array{Function}(length(vDicCod)) 
for i in 1:length(vDicCod)
   try
     vDicFun[i] = eval(parse(vDicCod[i]))
   catch
     error(vDicCod[i])
   end
end

vExprExpr = ["x=rand()", "y=rand()" , "z=2*x + 3*y"]
vExprCod = string.("() -> global " ,vExprExpr)
vExprFun = Array{Function}(length(vExprCod)) 
for i in 1:length(vExprCod)
   try
     vExprFun[i] = eval(parse(vExprCod[i]))
   catch
     error(vExprCod[i])
   end
end

const iter = 1000000

function evalExpr(nome, fu)
 local i::Int64
 local res::Float64
 local ini::Float64 = time()
 for i = 1:iter
   fu[1]()
   fu[2]()
   fu[3]()
 end
 println(nome ," => Elapsed Time: ",
   @sprintf("%0.3f",(time()-ini)/iter * 1000000), " microseconds")
end

evalExpr("Global",vExprFun)
evalExpr("Dictionary",vDicFun)

is a relevant difference for my because simulations software are loops with random stochastic generation inside that runs billions of times.

Global => Elapsed Time: 0.240 microseconds
Dictionary => Elapsed Time: 0.639 microseconds (~ 2,7x slower)