I’m trying to figure out how to read in data from an HVAC system with 75 units with Zone Temperature (What the temperature is now) , Set Point(What the temperature needs to be at start time), Minutes till start time from Now, and current energy being used by all units… I need it to tell me (in minutes till start) when to optimally start each unit so the energy being used by all units doesn’t go above a certain limit. The rewards would be 1. getting the room Temperature to set point on time, 2. Not letting the energy used by all units above the limit … I’m trying to figure it out using a weird Blackjack game with 75 players and a game that goes to the Energy limit instead of 21 … It’s wrong… I got no idea … I need one of you smart guys to give me a hint
This is more or less an optimal control problem. You need to define an objective function that tells how badly the solution looks and then try to minimize badness
Here people are eager to help you with whatever Julia programming issues. But the difficulties you are describing reside still on the engineering problemformulation side, I guess. It is rather unclear what your main Juliarelated problem really is.
You start by writing “I’am trying to figure out how to read in data from HVAC system…”. So, is your problem that of data acquisition/import? Online or offline? But then you also write about “using a weird Blackjack game”, which together with the title gives an impression that you want to solve the underlying problem of a control system design with some machine learning concept.
I guess it is the latter problem of the computational design of a control algorithm that you are after. Then my modest suggestion is to describe the (control) problem a bit more explicitly. I will give it a try myself based on what you provided so far but you may correct or complement me.
Controller inputs (arguments for the algorithm)

current_measured_temperature
: a vector or reals with 75 components, received every sampling period or lengthTs
. 
next_setpoint_temperature
: represented by two arrays (vectors) of length 75  the first array representing the next setpoint temperatures in individual zones, while the second vector gives the time at which the corresponding setpoint temperatures in are required. A fixed plan provided just once (?).
Controller output (aka manipulated variable)

control
 (also called manipulated variable by some) represented by a vector of length 75 giving the times at which the corresponding control is applied (heater is switched on in the given zone). In this format it assumes that currently the control is zero (not heating right now)  is that realistic?
Constraints
Provided that the correct interpretation of the control variables is that they are on/off states of heaters, each individual control variable determines the power consumed by heaters. Now, do you really mean that there is a constraint on energy? Or rather its instantaneous rate of change, aka power? If you actually meant the latter (which I guess), there will be an upperbound constraint on the the (possibly weighted) sum of all control variables.
Objective
Minimize the total energy, that is, the (possibly weighted) sum of sums (integrals with respect to time) of all the control inputs (corresponding to power), while achieving the setpoint tracking.
Other info available?
Any apriori knowledge about the whole systems? Max power of heaters? Are they really just on/off? Estiamtes of the thermal capacities of the zones? …
Honestly, while I believe that there is some potential for RL techniques in control, in this particular case I would perhaps give it a try and design some controller manually just using the knowledge of time constants of the individual zones (upon inspection of the data or even some engineering guess). Such engineering solutions based on physical insight could serve as an initial version for some optimization algorithms that could perhaps improve on the rough solution.
you’ve got the problem exactly right. Sorry I didn’t word it very well. I’ve got all kinds of code running to do all of this, both on the Controls system itself and in Julia. I was hoping for something better with the RL Idea. As it is, it’s just a guess as to when and what order to start any of the units in order to minimize the total power used at one time. This is known as “demand”. I get charged for the highest 30 min of demand and I get charged for regular usage, like you do at your house. Usage is cheap, demand is expensive. So I wanted to have a way to say start 4 units, then after they got the room temp close to the setpoint, start 4 more… and so on. Some units are bigger than others, some affect other rooms… I already measure total tonnage of units starting at say 8am and split those in 3rds and stagger them in 30 min increments… 7a, 7:30a, then 8a. It works, just not very exact. … The problem is all in the RL, everything else is easy. Also not all 75 units come on at the same time… might be 30 of them, might be 4 of them. Depends on what’s going on that day and how many rooms are in use and how long they stay in use… might be 30 min. might be 10 hours.
I am just curious: what makes you think that reinforcement learning is the most suitable approach here? There must be a lot of apriori knowledge here in the domain of HVAC packaged in the form of relatively simple mathematical models (say, those equivalent RC circuits), why not use it? Why ignore these and merely rely on (learning from) measured data? But sorry I am this curious. I am just eager to learn what drives people to these “youneednoknowledgeoftheunderlyingphysics” methods. Of course you choose the approach, go ahead.
Speaking about some advanced optimizationbased methods that rely on models, I have noticed that there are gazzilions of papers and reports on model predictive control (MPC) for HVAC and some of those methods have even found their way to real practical (even commercial) implementations. As a control engineer, I would perhaps choose go in that direction if I were assigned such task.
Finally, I would just repeat my original advice that in this particular forum you have a good chance of getting a help with some Julia language related problem, but you must formulate the Julia question by yourself. Perhaps jf you know how to proceed with RL in your domain but with another tool/language (say, Python or Matlab), post here a code and people could perhaps help you find a Julia equivalent. Good luck.
Perhaps you would like some consulting. This sounds like a great project and I would be happy to work on it prefessionally both from the engineering perspective as well as Julia code.
Rl might not be the best way. I do some research on MPC and maybe come back.
Thanks …
Objective
Minimize the total energy
Is it minimize total energy or cost?
You might, for example, precool some rooms before people get there to reduce the demand charge. This may not minimize the energy usage.