Mutating function parameters

using DataFrames

function doit!(flag, df)
    if flag==true
        flag=false
    end
    if df[1, :flag]==true
        df[1, :flag]=false
    end
    println("Within doit, flag=",flag,", df=",df[1, :flag])
end

function main()
    df = DataFrame(flag = [true])
    flag=true
    for i = 1:2
        println("$i - Before,  flag=", flag, ", df=", df[1, :flag])
        doit!(flag, df)
        println("$i - After,   flag=", flag, ", df=", df[1, :flag])
    end
end

main()

Generates the output

1 - Before,  flag=true, df=true
Within doit, flag=false, df=false
1 - After,   flag=true, df=false
2 - Before,  flag=true, df=false
Within doit, flag=false, df=false
2 - After,   flag=true, df=false

And the docs make clear that this is the expected behaviour. It says about passing by sharing that “modifications to mutable values (such as Array s) made within a function will be visible to the caller.” It also gives an example of how asigining a new value to a simple (numeric) parameter “changes the binding (“name”) to refer to a new value”.

So, two questions:

  • Why don’t simple variables mutate like arrays? What other types have this non-mutating behaviour.
  • Can I cause simple parameters to mutate in the same way as arrays or is this not possible. (I realise I could simply return the changed value).

Thanks,

Tim

Julia’s “variables” are about naming and that’s it. I really wish that we wouldn’t use the word “variable” because that — to me — implies some sort of mutable state. And I wish we’d use a “naming” operator that looks more distinct from mutating operations. But this is the world we live in:

  • flag = true is you deciding that you want to use the name flag to identify true (or whatever you put on the right hand side). You can re-use names as much as you want, but new names you (re-)define only affect your current scope by default. This is even the case for the op= shorthands — x += 1 is you saying that you have a better use for the name x, and it’s going to be whatever x was + 1.

  • Indexed assignments like df[1, :flag]=false — despite looking similar to that first bullet’s syntax — don’t have anything to do with naming! They update (mutate) the object that df names at the passed indices/columns to contain some value.

  • Property assignments (like x.property = blah) and broadcasted assignment (like x .= blah) are similar: they update (mutate) the object that x names.

So this has nothing to do with the type — it’s about the syntax you use!

3 Likes

Take a look at the Ref type (C Interface · The Julia Language these docs appear a bit cryptic on first read, a better mental model is given by What is Ref? - #14 by Oscar_Smith). You can use it like

using DataFrames

function doit!(flag::Ref, df)
    if flag[]==true
        flag[]=false
    end
    if df[1, :flag]==true
        df[1, :flag]=false
    end
    println("Within doit, flag=",flag[],", df=",df[1, :flag])
end

function main()
    df = DataFrame(flag = [true])
    flag[]=true
    for i = 1:2
        println("$i - Before,  flag=", flag[], ", df=", df[1, :flag])
        doit!(flag, df)
        println("$i - After,   flag=", flag[], ", df=", df[1, :flag])
    end
end

main()

(See also the Assignment vs. Mutation section of the manual.)

While you can use Ref for this, it’s more idiomatic to simply use the function result to return any non-mutable value that you want the caller to see.

1 Like

Thank you for this.
It is very like defining flag as a one element array and mutating that. Is Ref a lighter weight mechanism than this?

So I could use flag .= false to achieve what I want?

[Edit: Answer: no! This generates an error.]

Thank you.
I had read this at least once before. However, it doesn’t seem to matter how many times I read the docs, I only learn their content when I trip over something they explain. :roll_eyes:

3 Likes

I think the “why” here is the complicated part. To start, just note that other languages (like Python) behave the same. Others, like Fortran, “mutate” these “simple values” (probably the exception), while others require specific syntax for allowing mutation or not (meaning, copying the value upon passing, or passing the memory reference).

My understanding is that the “why” is a performance optimization, basically. When a “simple value” (put more precisely, the “immutable values”) is passed around, if the value is passed, then all the functions and operations can work on those values without having to care about external effects. Once the function receives the value, it owns it. If the memory address was passed instead, then every function would have to constantly fetch the value from the address to operate. That would be very expensive, and prohibitive when operating on simple things, like numbers. Many performance optimizations that compilers do depend on the guarantee that a value does not change within a subset of operations.

Then why don’t do that with everything? That would be actually very good. Every function would be “pure”, meaning that they would not modify any input value, compiler and programmer lives would be much easier.

The problem is that the “values” can be very large in memory, like is the case of large arrays, and then passing the value around can be too expensive. In that case, “the value” is the address of the array in the memory, but the actual content of the array is not passed around, just fetched or mutated directly in the memory.

So, basically, do not fight against this behavior. Simple “immutable values” should be returned from the functions and reassigned if necessary. That’s how the compiler is expecting you to code, and that’s what will give you the most clear and efficient behavior.

That is what happens to everyone. I think the manual is rather clear now, but sometime ago I compiled some answers about that behavior here, which might help: Assignment and mutation · JuliaNotes.jl

4 Likes

That’s really helpful and clear. Thank you.

1 Like