Why can't I use hcat in a for-loop?

Hi guys,
I am trying to concatenate rows into an empty dataframe using a for-loop like this:

sensitivity_mean_df = DataFrame()
base_df = DataFrame()
for variable in variables
    if isempty(base_df)
        base_df = DataFrame(parameter=unique_parameters)
    end
    sensitivity_mean_df = hcat(sensitivity_mean_df, some_dataset, makeunique=true)
end

However, if I run the code, I was told:

ERROR: UndefVarError: `base_df` not defined

even if I comment lines about base_df, then I got error

ERROR: UndefVarError: `sensitivity_mean_df` not defined

Why is that?

You need to say global base_df inside the for loop if you’re in global scope? The issue would then go away of you place this code inside a function

thanks! but why? if I use python, I could directly do such things without declare the global…

This is not Python so you can’t generally expect everything to behave exactly the same, the for loop introduces its own scope here.

Generally, it’s best practice to place code in functions for good performance, and you would not have this issue.

2 Likes

@Xu_Shan you can avoid manual loops and use a higher-level construct to reduce a list of dataframes with hcat.

dfs = [DataFrame(...), DataFrame(...), DataFrame(...)]

f(df1, df2) = hcat(df1, df2, makeunique=true)

df = reduce(f, dfs)
1 Like

Also worth pointing out that this would work in interactive contexts like the REPL and Jupyter notebooks because they relax the global scope rules a little for soft scopes. However, in source code or expressions, base_df = ... implicitly declared that base_df is a local variable in the for loop, so when you try to access it earlier in isempty(base_df), it was actually an undefined local variable.

The reason why local scopes like for loops automatically reassign outer local variables but not global variables is because global scopes can easily get very large. With include, you can split one global scope into arbitrarily many files, and you can eval any number of further expressions into it. This rule guards against accidental reassignment of global variables and encourages more local scopes, which can’t be split across multiple files or expressions.

Note that Python makes every file its own module. That pushes people toward many more smaller modules they have to import variables between, but it’s safer for more blocks to not introduce local scope. On the other hand, Python’s few locally scoped blocks don’t share variables automatically when nested, needing nonlocal declarations. Different languages have different designs and thus different rules, we just have to adjust.

1 Like

thanks! Then how can we use something like below?

n = 0
while something
  n += 1
end

Wrap it in function like so:

function main()
    n = 0
    while n < 10
        n += 1
    end
    @show(n)
end
main()

thanks! So within a function, it’s not a global variable anymore? but even it is local variable, they are within difference scopes?

Check out the documentation on the scoping rules: Scope of Variables · The Julia Language

As an example, consider

n = 0
while rand() < 0.9
	n += 1
end
println(n)

When run in the REPL, this works as intended (cf. @Benny 's reply):

julia> n = 0

julia> while rand() < 0.9
	       n += 1
       end

julia> println(n)
18

If you put it in a file called script.jl, then running julia script.jl in the terminal (or include("script.jl") in the Julia REPL) yields

> julia script.jl
┌ Warning: Assignment to `n` in soft scope is ambiguous because a global variable by the same name exists: `n` will be treated as a new local. Disambiguate by using `local n` to suppress this warning or `global n` to assign to the existing global variable.
â”” @ (...)\script.jl:3
ERROR: LoadError: UndefVarError: `n` not defined in local scope
Suggestion: check for an assignment to a local variable that shadows a global of the same name.
Stacktrace:
 ...

The warning hints that

n = 0
while rand() < 0.9
    global n
    n += 1
end
println(n)

in script.jl will solve the problem, as it indeed does:

> julia script.jl
8

In most situations the best solution (also for performance) is just to avoid (non-typed / non-const) global variables entirely, e.g. by using a function (cf. @ArchieCall ). If you put

function main()
	n = 0
	while rand() < 0.9
		n += 1
	end
	println(n)
end

main()

in our script.jl, you get

julia> include("script.jl")
4

This also works in the REPL:

julia> function f()
           n = 0
           while rand() < 0.9
               n += 1
           end
           println(n)
       end
f (generic function with 1 method)

julia> f()
8

If you had previously defined a global variable called n, then that is completely unrelated to the local n inside f:

julia> n 
18
4 Likes