Task local storage and scoped values: What they are used for

A recent Slack thread discussed what the purpose of task local storage (TLS) and scoped values was. Since these features of Julia are a little obscure, @Krastanov suggested I post them to Discourse for posterity.

@vchuravy already gave a talk on scoped values at JuliaCon 2024, which some people may prefer to a post like this:

TLS has been part of Julia since at least 2013, long before Julia 1.0. In contrast, scoped values are new in Julia 1.11, released about one year ago. Their use cases are distinct, but their existence have the same underlying motivations: How to handle global, mutable state.

A background: Shared values

In Julia, nearly all the data we process is passed to functions as arguments, and is therefore part of the local scope of the function processing it:

data = [1, 2, 3]
sm = sum(data) # passed as variable to `sum`
print("The sum of data is: $sm")

This is the most useful pattern for accessing data, and therefore by far the most widespread.
Let’s look at some exceptions to this pattern.

Global constants

Suppose I want to compute the molecular weight of a poly-A tail[1]. For this, I need to access the molecular weight of adenosine monophosphate, water, hydroxyl, and the five-prime cap. For example:

function poly_a_tail_weight(
        nucleotides::Integer,
        amp_weight::Real,
        hydroxyl_weight::Real,
        water_weight::Real,
        five_prime_cap_weight::Real,
    )
    return nucleotides * amp_weight -
        (nucleotides - 1) * water_weight +
        hydroxyl_weight +
        five_prime_cap_weight
end

While it works, this signature is kind of silly for two reasons:

  1. It feels semantically wrong that the molecular weights of these molecules are passed into the function call, because the weights are constant, and have nothing to do with this particular instance of computation. In constrast, the number of nucleotides really is local information relevant to precisely this computation. These two kinds of information: Constant knowledge, and information local to the function, should be separated somehow.

  2. It’s annoying to pass all these arguments to the function - that means the arguments must also be part of the caller’s signature, and that caller’s caller and so on, all the way up the call chain. So you would end up having tonnes and tonnes of arguments at the top level functions.

I guess there are also minor questions about efficiency: Why should this constant information be stored on the stack though the call chain?

Anyway, the solution is clear here: We store it outside the signature, as global constants:

const AMP_WEIGHT = 347.22
const HYDROXYL_WEIGHT = 17.007
const WATER_WEIGHT = 18.015
const FIVE_PRIME_CAP_WEIGHT = 803.40

function poly_a_tail_weight(nucleotides::Integer)
    ...
end

Much nicer!

Global mutable data?

One problem with const is that it’s… constant. We sometimes want to mutate data.

For example, suppose I have a set of genomes and, for each genome, a set of gene positions. I need to write out a file containing each genome’s instance of a given gene.
The files should be gzip-compressed, and because compression is a bottleneck, I reach for the high performance LibDeflate.jl.

This package requires us to allocate a mutable Compressor struct to handle our compression.
The function gzip_compress! takes the compressor, and mutates it in the process of compressing the input data.

If I was careless, I might do something like the following:

using LibDeflate

# OncePerProcess is new in Julia 1.12. In Julia 1.11, 
# I might use `Base.Lockable(Ref{Union{Nothing, Compressor}}(nothing))`
const COMPRESSOR = OncePerProcess{Compressor}(() -> Compressor())

function compress_genes(
        genomes::Genome,
        positions::Vector{<:UnitRange},
        path::AbstractString
    )
    buffer = IOBuffer()
    for gene_position in positions
        write(buffer, get_gene(genome, gene_position))
    end
    open(path, "w") do io
        write(io, gzip_compress!(COMPRESSOR[], UInt8[], take!(buffer)))
    end
end

This works perfectly fine… until I try to run compress_genes in multiple tasks concurrently.
Compressor is not thread-safe, so if it’s used by multiple threads at once, it will malfunction and most likely crash the process with a segfault.

In fact, even if I never use multiple tasks, it’s still dangerous. Someone else might use my package and spawn multiple tasks which calls into compress_genes.

This is the problem that both task-local storage and scoped values is intended to solve. They allow you to access data which is not passed as an argument, while also handling concurrent access better than a simple global mutable variable.

Task-local storage

Let’s revisit the problem above. The issue with using OncePerProcess here is that we must never have more than one thread mutating COMPRESSOR at once.

In Julia, by design, we have little control over threads. Threads are a resource transparently provided by the operating system, kind of like CPU time or memory. We request a thread when we schedule a task, but which thread we get, and when, is a decision made by the Julia runtime, and out of our control. This is analagous to how we can’t (and shouldn’t attempt to) control at which memory address out data is allocated.

Anyway, all this is to say that, in Julia, we shouldn’t attempt to directly interact with threads, we should interact with tasks. Since all Julia task runs on at most one thread at any given time (although a task may jump between threads), if we ensure that only one task accesses our compressor, we also guarantee at most one thread accesses it at one time.

Therefore, we can solve the thread safety issue by giving a new instance of Compressor to each task. The function task_local_storage accesses, or writes to, a dictionary which is specific to the current running task.

struct CompressorKey end

function compress_genes( ... )
    # Get compressor from task local storage if we already created it,
    # else create a new one.
    tls = task_local_storage()
    key = CompressorKey()
    compressor = if haskey(tls, key)
        tls[key]
    else
        tls[key] = Compressor()
    end
    
    [ ... ]

    # Read from the task local storage
    write(io, gzip_compress!(task_local_storage(CompressorKey()), UInt8[], take!(buffer)))
end

Above, we could also have used the new OncePerTask interface instead of task_local_storage. The former is a wrapper around the latter with a nicer API.

Note that in the case above, if we spawn a task to run compress_genes 100 times, it will only create a Compressor once.

The pattern above is not optimally efficient in its specific case - if we have, say 8 threads and 100 tasks, we create 100 individual compressors, where in reality, we only need 8 to avoid threading issues. A better example might have state which truly needs to be tied to the task, and not to the thread. Nonetheless, I hope the example gets the point across.

Scoped values

Scoped values are used when you:

  • Have some global data D
  • Wants D to take on different values at different point in the program
  • Need to access D from multiple different tasks

The canonical example is logging. You have some logger object, and it would be a pain to pass this object as argument through all your function calls, so this really should be a global variable.

The logger is an IO object, and should probably be protected behind a lock, so thread safety is not a concern.

However, the logger has settings, which can be changed during the program - one part of the program may need one set of settings, and another part another set. Or, your library could be called by two tasks, one which requires one logger setting, another which requires another setting. Your library makes use of multiple tasks itself, so using TLS is not appropriate - one logger state is shared between all the many tasks spawned by your library.

What you want is some kind of ‘task local state’ which is inherited by all child tasks spawned by the current task. And that is what scoped values are.

Here’s how to use it - assuming we have some kind of logger package:

using Base.ScopedValues

const LOGGER = ScopedValue(new_logger(DEFAULT_SETTINGS))

# Call some code using a modified logger
function do_computation(data; logger_settings::LoggerSettings=DEFAULT_SETTINGS)
    with(LOGGER => new_logger(logger_settings)) do
        [ ... ]
    end
end

Inside the with function’s scope, the constant LOGGER will be a ScopedValue set to new_logger(logger_settings) - even if inside that scope, I spawn multiple new tasks.
Outside the scope, LOGGER will retain its old value with default settings, even if outside the scope runs multiple tasks, and all these tasks, both outisde and inside the scope, run concurrently with each other.

Here is an example where the same global scoped value has two distinct values at the same time, as accessed by four different tasks.

julia> begin
           using Base.ScopedValues
           const SCOPED = ScopedValue(1)

           with(SCOPED => 2) do
               Threads.@spawn begin
                   sleep(2)
                   println("In scope: ", SCOPED[])
               end
               Threads.@spawn begin
                   sleep(1)
                   println("In scope: ", SCOPED[])
               end
           end

           println("Outside scope: ", SCOPED[])

           task = Threads.@spawn begin
               sleep(1.5)
               println("Outside scope: ", SCOPED[])
               sleep(1)
               println("Outside scope: ", SCOPED[])
           end
           wait(task)
       end
Outside scope: 1
In scope: 2
Outside scope: 1
In scope: 2
Outside scope: 1

The way this works is that the tasks are not really accessing the same, global scoped value. Instead, they are reading a certain kind of task-local storage, which is inherited by child tasks.

TL;DR:

  • Task local storage (TLS) and scoped values are both answers to how to access global mutable data with multiple tasks
  • TLS is used when each task needs to operate on a unique piece of data, typically when the data is not threadsafe and therefore can’t be accessed by multiple tasks
  • Scoped values are used when you want different parts of your code to use different values for some global variable, and each of these parts may make use of more than one task so TLS cannot be used

  1. This is just an example, the biological details here are questionable. ↩︎

14 Likes

That’s a good explanation.

I’d just like to add something about another application of scoped values. When doing computations for various different scenarios, they come in handy.

A case I had was some simulations of power systems, where the future isn’t well known. There are a number of parameters which can change (like 20 or so in our simulations),
so I collected them in a struct. These parameters are needed deep inside the simulation, inside parallel tasks, inside helper functions, everywhere. It would be cumbersome and would clutter the code to have them as arguments to every function in the simulation, just to pass them down to other functions which actually might use them or pass them further down to other functions.

@kwdef struct Parameters
    costoftransmission = 12.0
    maxcapacity = 150.0
    tax = 10.0
    ...
end
const defaultparams = Parameters()
const prohibitivelyhightax = Parameters(tax = 150.0)
const cheaptransmission = Parameters(costoftransmission=6.0)
...
const parameters = ScopedValue(defaultparams)

Then different scenarios can easily be run with

hightaxresult = @with parameters => prohibitivelyhightax simulate(...)
cheaptransresult = @with parameters => cheaptransmission simulate(...)
...

and the parameters are picked up anywhere with:

tax = parameters[].tax
4 Likes