# How to store data from a nested for loop?

Hello!

I have a nested for loop, which should result in a vector of vectors. Unfortunately, I’m only getting the end result, rather than each iteration. I’m probably missing something obvious about nested for loops; minimum (not actually) working example below.

``````using DataFrames
# Replicate Minimum Working Data
#generate DataFrame names
dfnames = ["a", "b", "c", "d", "e", "f", "g", "h", "max", "years"]
#generate column Years data
years = collect(1:10)
#generate Vector DataFrames
dfvector = []
for i in years
dfvector = push!(dfvector, DataFrame(hcat(rand(10,9), years), dfnames))
end

#Identify the maximum value for the first dataframe
maxvaluesyr1 = zeros(length(years))
for i in years
maxvaluesyr1[i] = maximum(dfvector[1][dfvector[1].years .== i,:].max)
end

#Identify the maximum value for the second dataframe
maxvaluesyr2 = zeros(length(years))
for i in years
maxvaluesyr2[i] = maximum(dfvector[2][dfvector[2].years .== i,:].max)
end
``````

The above is what I would like, but I need to loop the “for i in years” for loop over each dataframe. I tried the below example, but I kept receiving the last j loop and not the first 9. Any help is incredibly appreciated and just let me know if I can clarify further!

``````maxvalues = zeros(length(years))
maxvaluesvector = []
for j=1:length(dfvector)
for i in years
maxvalues[i] = maximum(dfvector[j][dfvector[j].years .== i,:].max)
end
maxvaluesvector = push!(maxvaluesvector, maxvalues)
end

maxvalues = zeros(length(years))
maxvaluesvector = []
for j=1:length(dfvector)
for i in years
maxvalues[i] = maximum(dfvector[j][dfvector[j].years .== i,:].max)
maxvaluesvector = push!(maxvaluesvector, maxvalues)
end
end

maxvalues = zeros(length(years))
maxvaluesvector = repeat([maxvalues], length(dfvector))
for j=1:length(dfvector)
for i in years
maxvalues[i] = maximum(dfvector[j][dfvector[j].years .== i,:].max)
maxvaluesvector[j] = maxes
end
end

maxvalues = zeros(length(years))
maxvaluesvector = repeat([maxvalues], length(dfvector))
for j=1:length(dfvector)
for i in years
maxvalues[i] = maximum(dfvector[j][dfvector[j].years .== i,:].max)
end
maxvaluesvector[j] = maxes
end
``````

Let’s take the first attempt:

This allocates one `maxvalues` vector. For each value of `j`, the inner loop

``````for i in years
maxvalues[i] = ...
end
``````

overwrites the values in this vector. Then for each `j` the line

``````maxvaluesvector = push!(maxvaluesvector, maxvalues)
``````

appends this vector `maxvalues` to `maxvaluesvector` (by the way the assignment is not necessary, you can write `push!(v, ...)` instead of `v = push!(v, ...)`).

The problem is that this always appends the same vector (the same container). You need to allocate a new vector for each `j`.

2 Likes

This is probably the issue. `push!` is pushing the same vector to `maxvaluesvector` every time, so every time you edit it you’re editing all the entries simultaneously. Here’s a simple example:

``````julia> results = []
Any[]

julia> x = [0]
1-element Vector{Int64}:
0

julia> for i in 1:3
x[1] = i
push!(results, x)  # Pushing the *same* vector every time
end

julia> results
3-element Vector{Any}:
[3]
[3]
[3]
``````

If you want your results to be different vectors, then you need to make that explicit. One easy way in this case is to `copy` when you `push!`:

``````julia> results = []
Any[]

julia> for i in 1:3
x[1] = i
push!(results, copy(x))
end

julia> results
3-element Vector{Any}:
[1]
[2]
[3]
``````

Edit: Yup, what @sudete said

3 Likes

Thanks @sudete, and @rdeits. I understand overwriting the initial vector is the expected behaviour and I expected that too. I didn’t expect it wouldn’t append the vector for each iteration of the outer loop, which could be thought of as each total inner loop.

I thought nested for loops followed logic like:

1. Iterate over each inner loop value [i] and store the results e.g. in a vector for the first outer loop value [j]
2. push! would store that first total inner loop given the first outer loop value [j]
3. The outer loop value [j] changes to [j] + 1 for example and the inner loop repeats and overwrites the initial storage (i.e. the first vector).
4. Push! then appends the overwritten vector to the outer loop vector, which becomes a vector of vectors.

It’s clear the logic is false, but can you help me understand where? If my thoughts aren’t clear above, just let me know how I can clarify!

The problems is: the overwritten vector that you are saving is always the same object, not a new object. You have two solutions:

1. Make the line `maxvalues = zeros(length(years))` the first line of the outer loop, so a new object is created each time.
2. Change the last line of the outer loop to `push!(maxvaluesvector, copy(maxvalues))` so you save a copy of the vector.

If you do not do either, what happens is that all positions of `maxvaluesvector` refer to the same object, that is being changed until the last iteration. You can check this by making a new change (setting the first element to zero for example) to one of the vectors inside `maxvaluesvector` and see that this change is reflected in all inner vectors instead of just that position.

2 Likes

Maybe it helps to look at a simpler case:

``````julia> v = [1,2];

julia> v_vector = [v, v, v]
3-element Vector{Vector{Int64}}:
[1, 2]
[1, 2]
[1, 2]

julia> v[1] = 4;

julia> v_vector
3-element Vector{Vector{Int64}}:
[4, 2]
[4, 2]
[4, 2]
``````

Here the line `v_vector = [v, v, v]` is equivalent to

``````v_vector = typeof(v)[]   # Empty vector of values with type like `v`
push!(v_vector, v)
push!(v_vector, v)
push!(v_vector, v)
``````

In both cases `v_vector` contains three references to the same object.

2 Likes