Function argument, binding, name, assignment, mutation

You may just say:
instead of doing
for j = 1:J @spawn f(x),
you can do
for j = 1:J @spawn f($x).

But this is also suboptimal (I’m reluctant to do so), because I just have a need to spawn tasks with the latest x.

Neither of these expectations are met by julia. If you alter anything which is shared, without explicitly synchronizing with a lock or atomics, the result is undefined. This applies to other languages as well, like C, C++, rust etc. That it might succeed is a property of the underlying architecture you run on, not anything which has been defined or even coded by the julia developers.

1 Like

Since I also have some knowledge about the C-language.

The action “let x be associated to a new object” should be as easy as “let a pointer to point to a new region of memory”.

The main work is done on mutating a block of memory, (which is perceived as, e.g. setindex! these in-place functions.)

But “let a pointer to point to a new region of memory” should be easy so that it is done in an instant? I don’t think there is any technical hindrance here.

Otherwise the definition of “variable assignment”/“name-value bindings” becomes questionable. At least I perceive it as a 2-point Bernoulli distribution concept. (Otherwise assignment is merely another sort of mutation!—then it shouldn’t be called a real “assignment” after all)

So, After all, I don’t care about whether Core.Box is slow. But I really care about the correctness and properness of the “assignment” operation.

There is, even if you code in assembly.
Something like

loadmem reg0, memref

where reg0 is a register and memref is a memory address, and loadmem moves 8 bytes from that address into the register, is not necessarily atomic on all architectures. It might be done with two loads in the microcode, and if run in parallel with a similar storemem, you may end up with incomplete data. It can e.g. be atomic if the 8 bytes are in a single cache line, and non-atomic if they cross a cache line boundary. Some architectures fault on such unaligned access, some give a warning, some do it non-atomically.

Anyway, julia developers leave such details to llvm, though I think this is atomic by chance on all architectures julia runs on.

5 Likes

Why would you expect to see invalid behavior? As I said, these compiler optimizations should only happen when they are provably un-observable.

Presumably, because the compiler would not generate code that caused that unexpected (invalid) behavior. And, BTW, how can an example prove that the compiler is always right?

I’m a bit puzzled about your recent questions. You appear to be expecting the compiler to do wrong and illegal stuff, and it confuses you that it doesn’t?

2 Likes

To prove my belief is wrong, with facts.

Cannot. But it can make me be determined about what style of code is not proper to write.

I guess you didn’t get my idea. I think Benny and sgaure understood though.

My current idea (after reading sgaure’s assembly code) is that:

  • I can write my assignment code without introducing any locks. It is going to be 99% correct.

The reason is that:
I don’t deploy my julia code to some industrial applications, I just do scientific computations on my computer. So I don’t need to worry about segfault.

If segfault happens, then I should change my mind—I’ll have to add locks from then on.
If segfault do not happen, then probably I’m getting what I want to get. (which is the current situation, i.e. writting m = gen(c) in my #11 post and getting the expected results.)

Looking for, and trying to trigger invalid behavior is fine, I get that. I just don’t understand why you appear to be surprised that the behavior is correct.

I would hope that the percentage should be significantly higher than that. I mean, as long as the code itself is correct.

1 Like

I’m glad you don’t have to understand. Because I discover something fresh and marvelous:

This bold revision of my #11 post has neither Core.Box nor any @locks, and it seems to work as expected (i.e. you cannot see “EEEEEEEEEEEE” on your screen). I’m wondering why I hadn’t thought of this.

import Random.seed! as seed!
N::Int = 999
function gen(cnt)
    if cnt % 2 == 1
        seed!(1)
        return rand(Int128, N, N)
    else
        seed!(3)
        return rand(Int128, N, N)
    end
end;
function check(r)
    s = getfield(r, :x)
    if s == gen(0)
        println(0)
    elseif s == gen(1)
        println(1)
    else # I cannot see this in practice
        println("\tEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE")
    end
end;
function test(r)
    for c = 1:10000000
        for j = 1:14
            Threads.@spawn check(r)
        end
        println("c=$c")
        setfield!(r, :x, gen(c))
    end
end

test(Ref(gen(0)))

Two weeks ago, Oscar told me to use mutable container. I didn’t believe him. And he possibly wasn’t expecting either that I implement it in this manner today and get the expected results. :upside_down_face:

(If interested, you can run this code in your computer to see if you can observe the “EEEEEEEEEEEEE”. You can tune the hyperparameters 999 and 14, and JULIA_NUM_THREADS.

I don’t quite get what you’re trying to check here.
In this code, you cannot see EEEE because you don’t overwrite a matrix stored in the reference as you did in #11. That way s is indeed equal to either gen(1) or gen(0) at any given instant (if s = r.x is atomic).
But a slight change may trigger the error again

function check(r)
    if r[] == gen(0)
        println(0)
    elseif r[] == gen(1)
        println(1)
    else
        println("\tEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE")
    end
end
1 Like

Very sharp observation. Thanks. I can indeed reproduce.

But I would like to ask if you can see “EEEE” with my original code in #28 post?


I want to get away with deliberately dropping @locks. I wonder if it is feasible.
Via these tests, if I can see “EEEEEE”, then I need to revert to the conservative state—using locks.

In the #11 post, I used a bare name m, which will be perceived as a Core.Box.

In the #28 post, I used a RefValue. The difference here is that all objects now are read-only (by “read” I mean it is never assigned to (though may be mutated!)).

The test in the #28 post works fine for me—I can’t see “EEEEEEE”. Suggest that I can perform the

  • one thread write, multiple thread read

job without needing locks.

As I said, no, and I wouldn’t expect to, because s is local to check there, so that it cannot be mutated by another thread.

But not all tasks spawned in the same outer loop iteration necessarily see r with the same parity because r gets mutated elsewhere.

The model in #28 post has a realistic background. The for c = 1:1000... loop is the master’s loop (so called), while the for j = 1:14 represents 14 subproblems. As long as the Matrix{Int128} is generated by gen(c), it is valid, and the tasks distributed to subproblems might be valuable. This was discussed in my another Discourse thread titled “How to write a master-subproblem iterative algorithm asynchronously?”.

I’ll change to s = getfield(r, :x).

That should be the same as r.x unless getproperty has been overloaded.

It’s super complicated: for a RefValue we have getfield, getproperty, getindex.
I don’t know if compiler can reduce the layers of pointless function calls.

It can. Otherwise every julia program would be slow. These functions are in general inlined and constant folded, unless you deliberately try to avoid it, or write a very complicated getproperty, mark it @noinline or some such thing.

Normally you would use either Threads.Atomic or atomic struct fields (@atomic) for such small things, not locks. It will generate atomic directives for llvm, which will be removed upon compiling to native code if they are unnecessary.

Also, even if you use a lock, it will attempt atomics a few times before entering a wait state. So if you generally hold a lock just for a few clock ticks, it will work almost like the much faster atomics. The @time macro reports the number of lock conflicts (i.e. the time consuming part).

1 Like

The question is, how are iterations of the master loop and the subproblems coupled.
I would assume that there is some coupling, otherwise making nested loops shouldn’t be necessary.
Now, the problem is, it’s totally legitimate for all @spawned tasks to actually start only when the master loop has finished. Then, obviously, all tasks are going to receive the same data.
I would assume that even in the simplest case when the subproblems are decoupled, you still need at least barrier synchronization for the inner loop:

for c in 1:a_lot
    @sync for j in 1:14
        Threads.@spawn worker(r)
    end
    update!(r)
end

I purposefully make them non-sync, because this might enhance the utilization of the multi logic processors. You can take slight a look at the real-application (and runnable) code.

@odow can understand what I’m doing (probably, he can run the code, and he knows the gist of the algorithm). And since I am really using a mutable construct instead of name/scopes/closure capture, this is considered a big step forward.

I have been watching this thread, and since I’ve now been tagged, I’m going to chime in and link everyone back to this post by Guillaume:

This whole post has been a debate about some very minor semantics of the language. Much of it has been trying to reason about what the compiler will do from trivial examples.

I’ll also link people back to:

If you want mutation across threads, the only correct answer is to use the proper primitives exposed by the language. This means either a lock, or some explicitly Atomic data structure.

Anything else may work, or it may not because it is undefined behavior.

Before any one replies, I’ll also ask that you step back and consider whether continuing this conversation is a productive use of your time.

5 Likes

okay, you’re right. I do not add more words. :blush: