[ANN] DaemonMode.jl: a package to run faster scripts in Julia

Julia is a great language, but the Just-in-Time compiler implies that loading a package could takes a considerable time, this is called the first plot problem.

It is true that this time is only required for the first time (and there are options, like using and the package Revise). However, it is a great disadvantage when we want to use Julia to create small scripts.

This package solve that problem. It is available at

Inspired in the daemon-mode of Emacs, this package create a server/client model. This allow julia to run scripts a lot quickly scripts in Julia, because the package is maintained in memory between the run of several scripts (or run the same script several times).

Usage

  • The server is the responsible of running all julia scripts.

    julia -e 'using DaemonMode; serve()'
    
  • A client, that send to the server the file to run, and return the output obtained.

    julia -e 'using DaemonMode; runargs()' program.jl <arguments>
    

    you can use an alias

    alias juliaclient='julia -e "using DaemonMode; runargs()"'
    

    then, instead of julia program.jl you can do juliaclient program.jl. The output should be the same, but with a lot less time.

Process

The process is the following:

  1. The client process sends the program program.jl with the required arguments to the server.

  2. The server receives the program name, and run it, returning the output to the client process.

  3. The client process receives the output and show it to the console.

Example

Supose that we have the script test.jl

using CSV, DataFrames

fname = only(ARGS)
df = CSV.File(fname) |> DataFrame
println(first(df, 3))

The normal method is:

$ time julia test.jl tsp_50.csv
...
3Γ—2 DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m18.831s
user	0m18.670s
sys	    0m0.476s

Only loading the CSV, DataFrames, and reading a simple file takes 18 seconds in my computer (I accept donnations :-)). Every time that you run the program is going to take these 18 seconds.

using DaemonMode:

$ julia -e 'using DaemonMode; serve()' &
$ time juliaclient test.jl tsp_50.csv
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m18.596s
user	0m0.329s
sys	0m0.318s

But next times, it is a lot faster:

$ time juliaclient test.jl tsp_50.csv
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m0.355s
user	0m0.336s
sys	0m0.317s

A reduction from 18s to 0.3s, the next runs only take a 2% of the original time.

Also, you can change the file (or run another one) and the performance is maintained:

test2.jl:

using CSV, DataFrames

fname = only(ARGS)
df = CSV.File(fname) |> DataFrame
println(last(df, 10))
$ time juliaclient test2.jl tsp_50.csv
10Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y        β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.25666  β”‚ 0.405932 β”‚
β”‚ 2   β”‚ 0.266308 β”‚ 0.426364 β”‚
β”‚ 3   β”‚ 0.865423 β”‚ 0.232437 β”‚
β”‚ 4   β”‚ 0.462485 β”‚ 0.049489 β”‚
β”‚ 5   β”‚ 0.994926 β”‚ 0.887222 β”‚
β”‚ 6   β”‚ 0.867568 β”‚ 0.302558 β”‚
β”‚ 7   β”‚ 0.475654 β”‚ 0.607708 β”‚
β”‚ 8   β”‚ 0.18198  β”‚ 0.592476 β”‚
β”‚ 9   β”‚ 0.327458 β”‚ 0.354397 β”‚
β”‚ 10  β”‚ 0.765927 β”‚ 0.806685 β”‚

real	0m0.372s
user	0m0.369s
sys	0m0.300s
59 Likes

This is great! How would this handle lots of juliaclient calls? Could each one run on its own thread?

A model like this is a good idea, but I think for serious usage, it would be better to pay the slight extra cost to create a module with a gensym’d name for each new script and eval that script into the module. The reason is so that you avoid namespace collisions like this:

mason$ cat foo.jl
f(x) = x + 1
@show f(1)

mason$ cat bar.jl
f = 1
@show f + 1

mason$ juliaclient foo.jl
f(1) = 2

mason$ juliaclient bar.jl
ERROR in line 1: 'invalid redefinition of constant f'

If you’re worried about the cost of dynamically creating and using modules, you can always just create a few thousand of them at startup.

19 Likes

Thank you for your interest. Well, this is the first version, I did it from scratch this morning, and it is not really yet to run multiple threads.
The problem is that I redirect the output of the server to the socket, and I think this simple approach would be in conflict with several threads. In order to allow several threads, I think I should change the approach. However, it is going to be in my TODO list.

6 Likes

Yes, you are right about the problem of conflicting names. I did not realise it. An option is to run each file in its own module, good suggestion.

5 Likes

That’s a beautiful package, thanks a lot!

I always wanted to (reasonably) use Julia scripts in Makefiles. With DaemonMode, I can finally do this. Still need to start and stop the server manually before and after calling make, but that’s okay for me.

One feature I want to suggest are have time-outs, though. When I call julia -e 'using DaemonMode; runargs()' foo.jl without having started the server, the client seems to wait forever. The connection error message is only displayed after hitting STRG+C. A more conventional behaviour would be to exit and display this message after a time-out.

Also for the server-side, an optional time-out would be nice. If, say, the server does not receive a request after idling for X seconds, it exits. With a feature like this, I could start the server as a background process without worrying about cleaning it up. As for the Makefile use case, this would stop me from having to start and stop the server process before and after calling make.

Great work!

This is pretty smart. Not immediately, but can we think about distributed computing?
Running a juliaserver on a remote system in a cluster. Function As A Service

Going a bit further, could we imagine having a juliaserver running on an accelerator equipped node (GPU or something else) and making a remote call to it?

2 Likes

I like you consider it intesting. Nowadays, I am a little busy, but when I will have more free time, I expect to improve it a little before officially register it.

I usually start the daemon, but never stop it (only when my computer is off, obviously). I suggest not to stop, unless there is a problem (conflict of names, …).

You say that it seems to wait forever. It is strange for me, because my output is:

$ julia -e "using DaemonMode; runargs()" test/aux.jl 
Error, cannot connected with server. It is running?

There is actually a time-out, the default of the system (it is working in my Linux with Julia 1.4 without any problem). Maybe in your system there is not that default, which OS and Julia version are you using? I think I should indicate explicitly the timeout to be sure that it is working in that way in all systems.

Thanks for you interest.

I was using Ubuntu with Julia 1.3 (I saw that 1.3 is not tested on travis, but gave it a shot, anyways).

Update: After re-booting, I tried again and it worked just like you said. Might have been an issue on my side.

Of course it could be modified for distributed computing.

In its current version should not be working, because it do not actually transfer the file by the sockets, only the directory, filename and current parameters. But it is a logical addition in functionality. Sorry for the long delay, I am very busy these days. When I were more free, I expect a new and great version, solving the suggestions.

2 Likes

This is wonderful but could you add the ServerId into your creation so that different julia program can run with different individual server?

POSTSCRIPT: Maybe, I don’t know, make the ServerID the socket port number? Or make the socket port number the ServerId

  • The server is the responsible of running all julia scripts.
julia -e 'using DaemonMode; serve(id=1234)'
  • A client, that send to the server the file to run, and return the output obtained.
julia -e 'using DaemonMode; runargs(id=1234)' program.jl <arguments>

you can use an alias

alias juliaclient1234='julia -e "using DaemonMode; runargs(id=1234)"'

then, instead of julia program.jl you can do juliaclient1234 program.jl . The output should be the same, but with a lot less time.

Following your suggestion, now runargs() and serve() allow to put directly the port (it is optional, 3000 by default). This allow us more flexibility in the port, and to use the port to identify the daemon.
Example (I use 3500, but any port, preferible high for security reason is possible):

julia -e 'using DaemonMode; serve(3500)'

and in the client:

julia -e 'using DaemonMode; runargs(3500)' program.jl <arguments>

However, the idea is to share the server between different clients, but I guess sometimes you would like to have different servers (in different environments or using different CPUs).

Thanks for the feedback.

It would be nice to have the choice between reusing the same module (maybe just the Main ?) or running a file in its own module.

I’m planning to use this module with Irace to do some parameter tuning of my algorithm ; reusing the same module allows me to parse my instances only once instead of at every script I run :slight_smile:

@Mason I am happy to announce that I have update the Package, and now each file run in its own module, avoiding the conflict of names.

After looking for many option, I change the include(fname) to

m = Module()
content = join(readlines(dname), "\n")
include_string(m, content)

and it is currently working!

Also, I want to thanks to the users, specially @gsoleilhac and @Palli, for the suggestions and changes. Now:

New changes

  • The port can be specified in server and client.
  • Each file is run in its own Module to avoid conflict of names.
  • The test have been improved to allow run them in parallel (with different port for each testset).
  • You can now send specific code to be run in the server.
using DaemonMode
runexpr("using CSV, DataFrames")
fname = "tsp_50.csv";

runexpr("""begin
      df = CSV.File("$fname") |> DataFrame
      println(last(df, 3))
  end""")
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

I am going to submit the package to the official repository, but I would like to know if you agree with the name β€˜DaemonMode’ or prefer another name.

11 Likes

Sorry for the delay, but I had very busy (after crazy weeks I have finally get the tenure position at my University!).

I have worked a lot with Irace and parameter tuning, actually I know its authors, if you need any help, do not hesitate in contact with me, I can help you.

In order to reuse the same module, I does not know yet how to select it as optional. Anyway, the default should be to run each one in its own module, because it is more secure.

12 Likes

Congrats @dmolina! I know very well that tenure in Spain is a major achievement…

I am going to submit the package to the official repository, but I would like to know if you agree with the name β€˜DaemonMode’ or prefer another name.

If you are going to do it, I say do it with STYLE

DaemonMaster

1 Like

First of all, thank you very much, I needed very much this!

It only works under Ubuntu for me, not under Windows… Did you only test under Linux? Is there a reason this should not work under Windows?

I’m getting (under Windows)

julia --startup-file=no -e 'using DaemonMode; serve()' <filename>
ERROR in line 1: 'could not spawn `'C:\Users\Christian\AppData\Local\Programs\Julia\Julia-1.4.0\bin\julia.exe' -Cnative '-JC:\Users\Christian\AppData\Local\Programs\Julia\Julia-1.4.0\lib\julia\sys.dll' -g1 -O0 --output-ji 'C:\Users\Christian\.julia\compiled\v1.4\DataFrames\AR9oZ_tt90P.ji' --output-incremental=yes --startup-file=no --history-file=no --warn-overwrite=yes --color=no --eval "while !eof(stdin)
    code = readuntil(stdin, '\0')
    eval(Meta.parse(code))
end
"`: operation not supported on socket (ENOTSUP)'

julia --startup-file=no -e 'using DaemonMode; serve()' itself works

Thanks you for your interest. Sorry for the delay.
It is suppose to work in Windows, I have tested by Travis, but not directly (I usually work mainly in Linux). It is strange because it is standard library, and I supposed it works in any OS. I have to check it.
One question, it is not working only when you are running an expression, or also running a complete file? Running files there are more tests, and I guess it should be working better.

1 Like

reread my question again. I should’ve been

`julia --startup-file=no -e 'using DaemonMode; runargs()' <filename>

for running files instead and I only screwed that up when trying out on Windows