Conditional SUM when the indices of one vector match the indices of another

Dimi · February 28, 2021, 12:44pm

Hello guys.

I am quite new at Julia and coding entirely, so I’d appreciate some help to the probably dumb question below.

I am trying to solve an optimisation problem for a nodal system of 24 points. At the moment I am trying to sum all demands that I have in each node, and create a vector with 24 elements (wherever I have no demands to get 0). But I am messing up somewhere and getting only 0 in all 24 nodes.

So the code so far looks like that:

#SETS
Demand = collect(1:17) #Energy Demand Points
Nodes = collect(1:24) #Nodal points

#DATA
Consumption = [84,75,139,58,55,106,97,132,135,150,205,150,245,77,258,141,100]

Demand_Node = [1,2,3,4,2,6,3,8,9,10,20,14,17,16,18,19,20]

for n in length(Nodes)
q_Demand_Nodal[n] ==
if n == (Demand_Node[d] for d in Demand)
sum(Consumption[d] for d in Demand)
else
0
end
end

I’ve even tried with this one below, but still nothing.

if Nodes[n] == (Demand_Node[d] for d in Demand)
sum(Consumption[d])
else
0
end

Could anyone help me with this easy issue?

Thank you in advance.

jules · February 28, 2021, 1:12pm

Hi! First of all, code is easier to read here if you put it between pairs of ``` like this:

```
# code here
```

Your intent is different than what you are writing here, let’s see:

n == (Demand_Node[d] for d in Demand) can never be true because you’re comparing n to the whole expression in parentheses, which is a generator object. That’s why you get only 0. You want to run this if branch for every d in Demand, right?

Also, q_Demand_Nodal[n] = ... doesn’t work in Julia, you can’t just start indexing into a non-existent variable to create an array. You either create an empty array and push! values to it, you create an array with as many placeholder values as you need and then store your results in that, or you use one of the functions that automate this process for you. One of which is map, which goes over all elements in a collection and stores the result of each function call. Or you use a list comprehension, which is like a loop in [ ] brackets which also stores the results of each loop.

Here’s a reformulation with simple loops:

q_Demand_Nodal = zeros(length(Nodes))
for n in Nodes
    s = 0
    for d in Demand
        if Demand_Node[d] == n
            s += Consumption[d]
        end
    end
    q_Demand_Nodal[n] = s
end
q_Demand_Nodal

And here a shorter expression that uses a generator and a list comprehension so you can compare (the init keyword of sum is only available on the brand new Julia 1.6)

q_Demand_Nodal = [sum(
    (Consumption[d] for d in Demand if Demand_Node[d] == n),
    init = 0
) for n in Nodes]

Dimi · February 28, 2021, 1:50pm

Hey Jules.

First things first, you fast and well detailed response is much appreciated. I seem to be understanding what the issue was and how you’ve constructed the rationale of the solution.

I’ve tried to run the code, and it runs, however the results are still all 0 in all 24 places, while they shouldn’t.

I’m using the code below to show the results:

show(value.(q_Demand_Nodal))

Which result in 24 0s. Is there an issue with command before the last end which forces all q_Demand_Nodal[n] to be equal to s thus 0? (I’d expect that this should have be the case in all nodes that have no corresponding demand, but apparently not).

Thank you again for your time and help.

nilshg · February 28, 2021, 2:09pm

Another option is to use a groupby operation, like the one in DataFrames:

julia> using DataFrames

julia> df = DataFrame(c = Consumption, node = Demand_Node)
17×2 DataFrame
 Row │ c      node  
     │ Int64  Int64 
─────┼──────────────
   1 │    84      1
   2 │    75      2
  ⋮  │   ⋮      ⋮
  16 │   141     19
  17 │   100     20
     13 rows omitted

julia> combine(groupby(df, :node, sort = true), 
         nrow => :rows,
         :c => sum => :total_demand)
14×3 DataFrame
 Row │ node   rows   total_demand 
     │ Int64  Int64  Int64        
─────┼────────────────────────────
   1 │     1      1            84
   2 │     2      2           130
  ⋮  │   ⋮      ⋮         ⋮
  13 │    19      1           141
  14 │    20      2           305
                   10 rows omitted

this might be introducing way too much overhead in your code, but for exploration I find it quite convenient.

jules · February 28, 2021, 3:48pm

I don’t know, it worked for me with your variables. Check that you don’t have old variables lying around that have been changed in the meantime. To be sure, wrap my code in a function and make all external arrays arguments of that function. Then you know inside the function everything is independent from your global level variables. That’s better for performance anyway, as you’ll no doubt learn later

yha · February 28, 2021, 4:03pm

What is this value function? You haven’t provided code for it. Perhaps the problem is there?
What happens if you just show q_Demand_Nodal directly in the REPL?

Dimi · March 1, 2021, 9:43am

That seems like an interesting approach, especially for grouping and getting an overall glimpse of the data, so I’ll definitely give it a go as I’m progressing.

Seems like that this approach omits all of the rows with 0 values thus not presenting the complete overview of the 24 Nodes, but I’ll definitely have a deeper look shortly.

Your help is much appreciated.

All the best.

Dimi · March 1, 2021, 9:46am

I’ll have a closer look as the mathematical model is already about 300 lines long, so I guess that either a constraint either some old variables interfere with the result, especially by the time you’ve mentioned that it has run correctly for you!

I’ll try to see how to implement the approach you’ve described and I’m sure it’ll work! Thank you for your constructive comments, I’m glad for your help!

All the best!

Dimi · March 1, 2021, 9:50am

It seems like that if I only run

show(q_Demand_Nodal)

I’ll only get as a return 24 rows of the identification of each element, i.e. q_Demand_Nodal[1] to q_Demand_Nodal[24]. That’s why I’ve used the value. in between, and then got all 0.

yha · March 1, 2021, 10:40am

That’s very odd. It’s not the usually behavior of show. Are you working in the REPL?
(You can also just type q_Demand_Nodal – there should be no need to call show explicitly)
After running @jules’s code with your variable definitions, I get

julia> q_Demand_Nodal
24-element Array{Float64,1}:
  84.0
 130.0
 236.0
  58.0
   0.0
 106.0
   0.0
 132.0
 135.0
 150.0
   0.0
   0.0
   0.0
 150.0
   0.0
  77.0
 245.0
 258.0
 141.0
 305.0
   0.0
   0.0
   0.0
   0.0

julia> show(q_Demand_Nodal)
[84.0, 130.0, 236.0, 58.0, 0.0, 106.0, 0.0, 132.0, 135.0, 150.0, 0.0, 0.0, 0.0, 150.0, 0.0, 77.0, 245.0, 258.0, 141.0, 305.0, 0.0, 0.0, 0.0, 0.0]

Dimi · March 1, 2021, 11:25am

I think I got to the root of the problem.

At my initial approach I’ve set the q_Demand_Nodal to be a variable [n in Nodes].

And now, when I was running the whole code, this was probably interfering with the results. Because indeed, when I was running the code line by line, the q_Demand_Nodal was returning the exact results that you see. But then when I was running it all and printing the results, things were getting odd.

But by deleting the q_Demand_Nodal from the variable list, the results are as everyone expected.

A big thank you again

nilshg · March 1, 2021, 2:13pm

Sorry I didn’t realize that including the zero rows was required, but of course that’s what your original code does. You can do this by initializing the DataFrame with all 24 nodes, and then leftjoining the consumption data and replacing missings with zero:

# Initial DataFrame with all nodes
df = DataFrame(node = 1:24)

# Consumption data by node
consumption_df = DataFrame(node = Demand_Node, Consumption = Consumption)

# Join consumption data onto initial data - this will create multiple rows if a node has more than one consumption data point
df = leftjoin(df, consumption_df, on = :node)

# Consumption will be missing for those nodes that don't have demand, set this to zero
df.Consumption = coalesce.(df.Consumption, 0)

# Same groupby as before - now gives back 24 rows
combine(groupby(df, :node, sort = true), 
         nrow => :rows,
         :Consumption => sum => :total_demand)

Topic		Replies	Views
Check the sum of a certain array index General Usage indexing , array , sum	19	1076	June 20, 2023
Julia sum issue General Usage question	2	623	September 2, 2021
JuMP - constraint for specific indices Optimization (Mathematical) question , jump	4	111	June 12, 2024
Trouble with nested summations in objective funtion Optimization (Mathematical) question	3	872	August 6, 2019
How to find the sum for indices satisfying certain conditions？ New to Julia question	3	325	February 26, 2022

Conditional SUM when the indices of one vector match the indices of another

Related topics