Multiple connected scripts BUT all sharing the same dependancies how do I set up the project?

I can’t get my head around how to set up the project so it handles this case. EITHER it’s so simple ( just putting the scripts in the src folder) or it’s not wise to do so? and put them in 3 separate projects

I have three scripts that share the same dependency. They run as separate pids from a cron job but they are connected by ZMQ. So there is a server that receives the data streams and two clients that send data to the server.

They all share ZMQ but cannot be included into one script as they are architected to provide me with choice as to when and where to run the clients.

This is not an architectural question

SO I start the server

using ZMQ
context = Context()
socket = Socket(context, REP)
ZMQ.bind(socket, "tcp://*:5556")

while true
    # Wait for next request from client
    message = String(ZMQ.recv(socket))
    println("Received request: $message")
    if message == "END"
       println("dying")
       break
    end

    ZMQ.send(socket, "World")
end

ZMQ.close(socket)
ZMQ.close(context)

then another cron job start the next script

using ZMQ
context = Context()
# Socket to talk to server
println("Connecting to hello world server...")
socket = Socket(context, REQ)
ZMQ.connect(socket, "tcp://localhost:5556")

for request in 1:10
    println("Sending script one request $request ...")
    msg = "script one request $request ..."
    ZMQ.send(socket, msg)

    message = String(ZMQ.recv(socket))
    println("Received reply $request [ $message ]")
end

ZMQ.close(socket)
ZMQ.close(context)

and finally the cron process starts the third program

using ZMQ
context = Context()

# Socket to talk to server
println("Connecting to hello world server...")
socket = Socket(context, REQ)
ZMQ.connect(socket, "tcp://localhost:5556")

for request in 1:10
    println("Sending script two request $request ...")
    msg = "two request $request ..."
    ZMQ.send(socket, msg)
    message = String(ZMQ.recv(socket))
    println("Received reply $request [ $message ]")
end

ZMQ.send(socket,"END")

ZMQ.close(socket)
ZMQ.close(context)

You don’t need a src/ folder if you’re just running a bunch of scripts. If they all live in the same folder, the easiest way is to create a new environment in that folder and point Julia to it when running each script (e.g. using julia --project myscript.jl).

5 Likes

nope STILL not getting it. Thank you so much for taking the time.

If I generate a project

generate zmq_example
  Generating  project zmq_example:
    zmq_example/Project.toml
    zmq_example/src/zmq_example.jl

so I put the server code into zmq_example.jl.

THEN I save the two client scripts to the /src directory ( bear with me here)

 zmq_client_2.jl
 zmq_client_1.jl
 zmq_example.jl

so using your example this would point at the server zmq_example.jl

julia --project zmq_example.jl

so it’s now started and listening but NOT the clients. Using the below seems to be 3 separate projects. Remember each one of these scripts is an entity into itself ONLY communicating via ZMQ as per the code examples.

cronjob 1 -->   julia --project zmq_client_1.jl
cronjob 2 ->    julia --project zmq_client_2.jl

I still don’t see how to create a project where scripts all share the same dependencies but cannot be included into a main project module. I am REALLY missing something here and want to get this right.

What is your definition of a project? I think you might be thinking about a package (the src/ directory is not needed for a Julia project, only for a “package”).

Dependencies in Julia are handled by environments and if you just want three completely independent scripts that share the same dependencies, as @ToucheSir pointed out, putting them in one environment (= a directory with a Project.toml file) is the most straightforward solution.

You can start them all independently with their own PID, the --project option just defines which environment the Julia process should use (in this case the one at ../Project.toml, assuming you are invoking the scripts from src/), it doesn’t mean that the scripts are related though.

Generating a package directory with the src/ directory using Pkg or PkgTemplates is just a convenience thing if you want to create a package with a dedicated main file which you would want to import into another piece of code. Just make a new directory, put the scripts there and set up your environment.

As far as I know, one cannot prevent including one file into another, but it anyway only happens if files are explicitly included in the code.

Here is also more information about how environments and packages, etc. work.
https://docs.julialang.org/en/v1/manual/code-loading/#Environments

But perhaps I’m not getting what you want to do exactly.

2 Likes

first of all thanks to both @ToucheSir and @Sevi for taking the time to help me out. I’m being slow to pick this up as it’s something I know little about. You have both helped tremendously if only to point out that I suck in explaining what I want to do.

In my mind it’s clear, start up a server that exists in plutohooks that accepts a stream of data and populates a dataframe in realtime with the data items. The data items are streamed from a pair of datapumps. The data streams are provided to the server via ZMQ. All the scripts share the same base dependencies ( ZMQ) and Pluto will take over the package management once the clients are running.

NOW I think that @ToucheSir did a fine job in setting me on course and @Sevi added the fine tuning, thank you so much for taking the time.

couple of other points I have come across

--project[={<dir>|@.}]
	Set <dir> as the home project/environment. The default @. option will search through parent directories until a Project.toml or JuliaProject.toml file is found.

and

https://docs.julialang.org/en/v1/manual/command-line-options/
1 Like

I’ve found it more robust to instead of using --project, at the top of each script having

using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

This has the advantage of not being sensitive to the current directory of the caller (the activate line) and automatically installing missing packages (the instantiate line) if run on a new computer.

5 Likes

Note that this will disable Pluto’s package management - which is probably reasonable in this case but OP mentioned using it currently so just thought I’d make this explicit.

what would please ? I SERIOUSLY want to use PlutoHooks. But the clients are outside pluto feeding in via ZMQ to a Plutohook cell ala @Pangoraw which excites me when married with @lungben dataframe grid approach.

Seems to me NOW ( reading your post ) that I have to have one set of dependencies for the clients and one for the server inside pluto.

Sorry I didn’t want to confuse the issue. I’ll try to clarify:

What everyone in this thread has been saying is that dependencies of a project are recorded in a Project.toml file, with a fully resolved environment including transitive dependencies stored in a Manifest.toml file. Both of these are simple plain text files which are automatically generated by Julia’s package manager.

They are also independent of any specific bit of code (e.g. a script like the ones you seem to have); that is, I can create an environment and add some packages to it (which will generate Project.toml and Manifest.toml files) and then activate that environment by doing Pkg.activate(path_to_the_toml_files) from any script and use the packages recorded in the Project.toml (as well as any precompiled code).

Now Pluto has it’s own package management system, which basically creates a separate environment for each notebook and records the equivalent of a Project.toml and Manifest.toml file straight in the notebook. To see this open any of your Pluto notebooks in a text editor (they are just *.jl files after all), and at the bottom you’ll find something like this snippet from a random Pluto notebook of mine:

# ╔═╡ 00000000-0000-0000-0000-000000000001
PLUTO_PROJECT_TOML_CONTENTS = """
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"

[...]

[compat]
CSV = "~0.10.9"
DataFrames = "~1.4.4"
GLM = "~1.8.1"

[...]
"""

# ╔═╡ 00000000-0000-0000-0000-000000000002
PLUTO_MANIFEST_TOML_CONTENTS = """
# This file is machine-generated - editing it directly is not advised

julia_version = "1.9.0-beta3"
manifest_format = "2.0"
project_hash = "6b6d5e67885866f045a1fc0fcd9fcaa2c29b1d17"

[[deps.AbstractFFTs]]
deps = ["ChainRulesCore", "LinearAlgebra"]
git-tree-sha1 = "69f7020bd72f069c219b5e8c236c1fa90d2cb409"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.2.1"

[...]
"""

What that means is that by default environments and dependencies are not shared across Pluto notebooks. Note that this does not mean that packages get installed multiple times - if more than one notebook used DataFrames 1.4.4 then the same package stored in ~/.julia/packages will be used across notebooks.

What it does mean is that if you have one notebook that does using DataFrames and another notebook that does using DataFrames, SomeOtherPackage then those notebooks might be using different DataFrames versions if SomeOtherPackage somehow restricts the version of DataFrames which can be installed.

If you do what Gunnar says above, Pluto’s automatic package management will be disabled and the project depdendencies will be read in from the Project.toml and Manifest.toml files found in the present directory (that’s what Pkg.activate(@__DIR__) does - it means activate the project found in the current directory). This means that all scripts/Pluto notebooks stored in this directory will use the package versions specified in the same Project.toml/Manifest.toml file, and if you do any package operations in one notebook this will affect all other notebooks.

As far as I can tell none of this has anything to do with PlutoHooks or your usage of ZMQ. You might want to think about whether it is required to have your clients and servers use the same version of packages (in which case a shared environment might be sensible), but for the most part it seems to me the question of whether you want to have one or multiple Project.toml files is irrelevant to the actual running of your project (caveat here is that it’s not 100% clear to me what you are doing, I’m guessing based on our interactions over the years!)

1 Like

thank you for taking the time to clarify. Most helpful as this is the first time I am considering using project in my code. I’m not sure how I can explain what I am trying to do other than what I already have. But happy to try again.

OUTSIDE Pluto there will be a bunch of data pumps ( clients) that feed a SERVER which will be INSIDE Pluto. The connective tissue is a queue provided by ZMQ.

To me this means that INSIDE Pluto will use the internal package manager ( thus a different dependency set) and OUTSIDE ( the clients) will have a different set of dependencies.

I do NOT wish to have the clients inside Pluto for architectural reasons. Having them outside makes my Pluto code cleaner ( just handling the dataframe ) and allows me to feed the datastream as an when I wish with ease ( just cron or cli).

I am FINALLY getting to do this after a year of doing other things and I want to get it right first time. I tried using Stipple but it kept freezing and I have ALWAYS wanted to use Pluto. So I know the process works and the dataframe is populated effectively from the realtime feed ( yeah I know I could use arrays but I prefer the utility of dataframes).