Is Julia the right tool for the job?


#1

Hi everyone.

So I have to create something that runs/calls ffmpeg/libav and takes screenshots of a video in specific timestamps.
I may want to spawn multiple “screenshot operations” in parallel.

I thought about using Julia because:

Question marks for me:

  • GC pauses
  • Working with Julia in a containerized environment (Docker)

I could spawn a child process for this in Python or Node.js, but I want to have full control of what’s happening with the video processing. (or maybe theres a way to do that with a child process but I dont know how)

Either way, apart from this being a Julia-related forum, is Julia the right, or rather, even a good tool for this job?
Let me know your opinions :slight_smile:

Thanks!


#2
  • GC pauses

I’d check how much garbage is created in VideoIO.jl, if the amount isn’t large, you should be able to run in nearly constant time with very few world stops.

  • Working with Julia in a containerized environment (Docker)

Julia is much more self-contained compared to Python, so you usually don’t need Docker (that hard). However, if Docker is already in your tech stack, there should be no issues with running Julia inside a container.


#3

This is how long a GC sweep takes when there is nothing to do:

julia> using BenchmarkTools

julia> @btime GC.gc()
  46.886 ms (0 allocations: 0 bytes)

It is entirely possible (and in my opinion easier than in any other garbage collected language I know of) to write code that doesn’t allocate dynamically (so that you can turn off the GC), but it still takes some effort.


#4

No it isn’t. Try GC.gc(false)


#5

Interesting. Is a full GC sweep never run automatically?


#6

Doing this is completely trivial, it should be no harder than it is with any other language (and maybe easier). There is one important caveat though: because as of now Julia still needs to re-compile everything every time the julia process itself restarts, if you are expecting to use docker containers that are repeatedly starting fresh, the startup time will be rather poor. You can mitigate this somewhat by doing using Packages (where Packages are the packages you expect to use in whatever you want to run) in your Dockerfile, but it doesn’t solve the problem. You could experiment with using PackageCompiler but in my experience it takes some effort to get this working.

If on the other hand you intend on keeping your docker container up all the time, or you don’t really care about start-up time anyway, you have nothing to worry about.


#7

No but that’s not something it’ll do “when there’s nothing to do”. I.e. a lightly allocating program without a lot of persisting allocation should be able to run without hitting a full sweep.


#8

Thanks very much for providing first-hand knowledge of garbage collector internals. While it would be good to have tight guaranteed bounds on GC time for e.g. serious robot control purposes in the presence of dynamic memory allocation, this is already much better than what I thought was the state of affairs.


#9

Thanks for the answer!
I see you also work on the Kafka related libs for Julia.
Thanks so much for the work, I will be needing to use them as well (RDKafka) :smiley:


#10

Thanks for your answer!
Yes, I will spawn a new container for every request I get for a new series of screenshots. This way I can scale it horizontally, since I am using Kafka to process such requests.