Scope of variable - is this expected?

I found this very weird. When I call f(y), the global value of y changes. Apparently, the line:
y[y.<0] = 0.01 is running globally. This doesn’t make sense to me.
Is it suppose to be like this?

function f(y)
  y[y.<0] = 0.01
  return mean(y)
end

srand(1234)
y = rand(Normal(),100)

minimum(y)  # -3.21
calculated_mean = f(y)
minimum(y)  # 0.01

This has nothing to do with scope. Arrays are passed by reference in Julia, so the function can change the contents of the caller’s arrays. (This is essential for high performance in many algorithms; otherwise it would be impossible to write in-place functions.) See https://docs.julialang.org/en/latest/manual/functions.html#Argument-Passing-Behavior-1

The behavior might be clearer if you renamed your variables:

function f(y)
    y[y.<0] = 0.01
    return y
end

z = [-1,-2,3,4.0]
f(z)
println(z)

outputs [0.01,0.01,3.0,4.0]: the array z was changed by f(z), because the argument y is a reference to the same underlying array.

(By convention, if a Julia function modifies its arguments, you name it to end with !. i.e. you would normally call your function above f!(y) rather than f(y). This is merely a custom, however.)

This is different from Matlab, which behaves as if passing an array argument made a copy of the array. (Internally, it actually only makes a copy if the array is written to.) However, Matlab is unusual here; almost every other mainstream language works more like Julia.

It is also instructive to compare to the following function:

function g(y)
    y = [x < 0 ? 0.01 : x for x in y]
    return y
end

z = [-1,-2,3,4.0]
g(z)
println(z)

which prints [-1.0,-2.0,3.0,4.0] — the caller’s array z was not modified by g, even though g(z) returns the same thing as f(z). The reason for this is that the assignment y = [...] does not change the contents of the existing array y, it instead allocates a new array [...] and makes the variable y “point” to the new array instead.

(Again, this is not unique to Julia. In most programming languages, you have to distinguish between the underlying data and the variable name that is “bound” to it or “points” to it. You can have multiple variable names pointing/referring/bound to the same underlying data X, and you can change a variable to point to different data Y without affecting variables that point to X.)

3 Likes
1 Like