I get a dead lock with only one lock

Hi all :slight_smile:

I’ve got the following in one of my pkgs. I put a print before and after. But one in 100 run it just stops there and does not print the second time. I’m kinda clueless how to debug this problem even more and fix it. Seems like correct Julia code for me. (PS: I don’t get the issue with the example code in the repl)

julia> l = ReentrantLock()
ReentrantLock(nothing, Base.GenericCondition{Base.Threads.SpinLock}(Base.InvasiveLinkedList{Task}(nothing, nothing), Base.Threads.SpinLock(0)), 0)

julia> d = Dict()
Dict{Any, Any}()

julia> lock(l) do
           d[:key] = :value
       end
:value

Cheers

Edit:
Julia Version 1.6.3 and 1.7.2

bit hard to help when your MWE works just fine :slight_smile:

even this without deadlock

for i in 1:1_000_000
       lock(l) do 
          print("a");
          d[:key] = :value
       end
end
1 Like

Your example looks like serial code. Why use a lock then in the first place?


Base.@kwdef struct _ActiveBars
    dict::Dict{Int, BarManager} = Dict{Int, BarManager}()
    last_id::Threads.Atomic{Int} = Threads.Atomic{Int}(1)
    lock::ReentrantLock = ReentrantLock()
end

const _ACTIVE_BARS = _ActiveBars()

function _start(; 
    bar_structure::Type{BarStructureType},
    initial_bars::Union{Vector{Dict{UniqueSymbol{ExchangeSymbolType},BarStructureType}}, Nothing},
    interval_between_bars::Second,
    bar_calculation::Function,    

    unique_symbols::Set{UniqueSymbol{ExchangeSymbolType}},
    quote_change_types::Vector{Messages.QuoteChangeTypes.QuoteChangeType},

    bar_output::Function,

    include_with_these_last_trade_conditioncodes::Vector{String},
    drop_message::Function,
    history_length::Int,
    future_length::Int,
    active_bars::_ActiveBars = _ACTIVE_BARS
)::Int where {BarStructureType, ExchangeSymbolType <: AbstractExchangeSymbol} 

    id = Threads.atomic_add!(active_bars.last_id, 1)

    bar_manager = BarManager{BarStructureType, ExchangeSymbolType}(;
        name = "bar_$id",
        bar_output,
        bar_calculation,    

        unique_symbols,
        quote_change_types,
        initial_bars,
        interval_between_bars,
        include_with_these_last_trade_conditioncodes,
        drop_message,
        history_length,
        future_length,
    )

    _start(bar_manager)

    lock(active_bars.lock) do 
        active_bars.dict[id] = bar_manager
    end

    return id
end



function _stop(
    id::Int;
    active_bars::_ActiveBars = _ACTIVE_BARS
)::Nothing

    bar_manager = lock(active_bars.lock) do
        pop!(
            active_bars.dict,
            id,
            nothing
        )
    end

    isnothing(bar_manager) && error()
    
    _stop(bar_manager)

    return
end

That’s the code. I want to make sure that start and stop can be called from different threads… That’s why I use the lock

The example problem works for me as well… That’s the problem

That’s just the example. I want to make sure that start and stop are thread safe

Depending on the number of _ActiveBars instances floating around you could have more than one lock?

To verify if locking is really the problem here you could use Base.trylock?

1 Like

Should only be one. Because of the const in front of it

What does this ultimately call? As far as I can tell, it’s not part of the code you posted.

function _start(bar_manager::BarManager)::Nothing
    
    bar_manager.is_running[] = true

    _HistoryManagement.start(bar_manager.history)

    bar_manager.market_subscribtion_id[] = Markets.subscribe(
        (message) -> _HistoryManagement.getting_messages(bar_manager.history, message),
        [
            Markets.SymbolSubscriptionInfo(;
                unique_symbol = unique_symbol,
                quote_change_types = bar_manager.history.quote_change_types
            )
            for unique_symbol in bar_manager.history.unique_symbols |> values
        ]
    )

    bar_manager.producing_task[] = Utils.@my_spawn _start_producing_bars!(bar_manager)

    return
end

Here is what I would try assuming a problem in my code(and it would be a pain in the …): first I’d try to make the problem more reproducible mainly increasing the CPU workload artificially (eliminating file I/O and network I/O, increasing number of threads etc.). Then I’d try to produce a trace of the multi threading activity (spawns, joins, locks, channel read/write). With a bit of luck one of these traces could reveal a deadlock…

There could be chance that you are encountering a Julia problem, but even in this case the steps above could help finding a reproducer…

I tried it now with Base.trylock. Same behavior - one in 100 stops

But trylock doesn’t wait: it returns false if the lock is not available.

1 Like

The strange part about it. Maybe I did something wrong.

    if Base.trylock(active_bars.lock)
        active_bars.dict[id] = bar_manager
        unlock(active_bars.lock)
    else 
        error("Not able to lock")
    end
    # lock(active_bars.lock) do 
    #     active_bars.dict[id] = bar_manager
    # end
1 Like

No, I’d extend it to something like

    l = ReentrantLock()
    retries = 5
    while retries > 0
        if Base.trylock(l)
            unlock(l)
            break
        end
        sleep(1)
        @info "retrying"
        retries -= 1
    end
    if retries == 0
        @error "deadlock?"
    end

Couple of more questions:

  • do you see any Julia activity in your system monitor when the problem occurs?
  • did you check with different Julia versions?

Quick update… it wasn’t the lock, but the start function bevor. But still don’t know why it’s stops completely…

Thank you all

1 Like