AI models are too costly

Currently, fossil fuels make up for 78% of US energy production, so they absolutely can substitute each other. If Google said, “we want to use 100% renewable energy”, that is currently impossible. So the only way they can meaningfully reduce CO2 emissions is by reducing energy consumption.

A structures pipeline could be a good starting point! My own workflows (mostly Rasters.jl manipulation into DynamicGrids.jl models) is doable, but there is a lot of custom mess involved that may be difficult to analyse. I’m thinking about type piracy hacks like adding counters to packages like DiskArrays.jl/Arrow.jl/CSV.jl to count data use. Maybe there are lower level system monitoring ways to acheive these goals. For energy use we can just track julias use of cores over time somehow.

But generally providing accurate feedback is what we need more than telling people what to do.

I think in in many cases efficient science is also efficient energy use “This algorithm in julia is 10,000 times faster so I just run it on my laptop” but in some cases its not - “I can just parallelise this on 100 servers in the cloud with these three lines of code”. But just having that information clear and easy to access would make those decisions and tradeoffs much easier to make.

1 Like

This may be relevant: https://thenewstack.io/which-programming-languages-use-the-least-electricity/

I wonder where Julia is on this criterion…

And also: The 10 most energy efficient programming languages | by Kasper Groes Albin Ludvigsen | Medium

I guess you meant “some think”.

@yvikhlya told he thinks CO2 is a problem. I think with appreciable probability it is a problem.

We don’t think it is the problem.

2 Likes

Reducing energy consumption (among other resources) is a good idea regardless of CO2 or climate change, and this is what policies should be focused on. Focusing on CO2 emissions alone is a shortsighted approach, as well as focusing on US statistics when talking about global problem.

1 Like

I focus on the US because I know where to find the relevant data, but indeed it is a global problem, and globally 80.1% of energy is from fossil fuels. So just slightly more than the US on it’s own.

But again, the point I was trying to make is that even if the focus and the hype is on CO2, that is far from the only benefit. Burning coal, natural gas, etc produces a huge number of pollutants. So I can say, “let’s quit coal because of CO2”, but that has other benefits in terms of reducing fine particulate pollution as well.

These things are not separable. Consider someone trying to lower their blood pressure, that is only one thing. But if focusing that one thing gets that person to: lose weight, start taking daily walks, eat less salt, then they get the side benefits. Lower cholesterol, healthier cardiovascular system, etc.

If tech giants and AI programmers start thinking about reducing CO2, then there will be all kinds of other non-CO2 benefits: less water usage, less land usage, etc.

1 Like

First, thanking somebody doesn’t mean agreeing in all points.

Second, I didn’t say “we shouldn’t try and track or understand resource use at all because its hard”. We just should be aware of whether what we aim is possible (it is in parts not), and if it is possible, what is the price and the risks - in terms of money, in terms of ecology, in terms of the most important resource of the Western societies, which is freedom. Before trying a societal change, ensure you are aware of what can go wrong. And things can get terribly wrong.

And now back to my question. The discussion was at that point not about the society as a whole, but just about admission of scientific publications. So, in your opinion it is desirable to provide the full report on the resources spent to produce the paper, did I understand you correctly? Now, on some days I work from home, on other days I commute to the laboratory by car. Should account on my commuting be part of the report?

2 Likes

Trying to reduce CO2 emissions in no way threatens freedom. Just no.

5 Likes

@yvikhlya and @Eben60 it seems to me you’re arguing with straw men here… there is no-one in this thread saying that CO2 is the only problem. No one is arguing for huge damaging CO2 only responses that ignore all other problems.

We are just worried that computing has an increasingly large energy footprint, this thread specifically AI. And from our own small sphere of influence, we wonder what we can do about that. What practical steps could exist that might improve things?

If you really care a lot about your own issues it may be more productive for you to make threads with actual ideas of what to do about them, here or on other forums where it fits.

This is a scientific computing forum, so it seems to me that we are likely discuss the environmental costs of scientific computing and computing more broadly, like AI. Energy use is the most obvious one, right? And CO2 from energy use one of the most obvious downsides?

We don’t have to fix every problem in the world to make discussing this one problem valid.

5 Likes

You know, you would think so but I’ve encountered strangely enough situations where being “fast” is not enough – for me that is wild when in health research. I could belabor that point but we could save it for JuliaCon :wink:

Alright, just to have this thought written down for a potential JuliaCon where I am present and we could hack on something, here is what I am imagining for a proof of concept based on your suggestions:

Proof of concept functionalities:

  • Sized of gzipped significant code used
  • Run time tracking
  • Core use

Discussion points:

  • What constitutes a workflow?
  • Where to define a pipeline “starts” and “stops”?
    • Anything done locally? Remotely and locally?
    • What processes to track?
    • What is total run time? Do locks count?
  • Size of input data
    • Type piracy approaches?
    • How to handle handle interactions with non-local data (i.e. remote databases)
  • Thread tracking

Anything else you could imagine @Raf ?

EDIT: I should ask, are you thinking of going to JuliaCon as well Rafael?

1 Like

Not necessarily. I have no doubts that people are able to invent “cheating” ways of reducing CO2 which are no good for environment. Just like wait loss is not necessarily good for health. Substituting solutions to different problems is not a good idea. If we want to cut water, land and other usage, we should focus directly on these problems, no need to mangle them with something else.

2 Likes

That’s what I was arguing against.

As for what “we” can do - I didn’t see any constructive ideas here yet, but maybe some would come.

Every regulation limits freedom.

1 Like

And yet every energy/environmental topic ever started on any public platform immediately falls to CO2 discussion and pretty much nothing else. IMO, this is silly.

We are just worried that computing has an increasingly large energy footprint, this thread specifically AI. And from our own small sphere of influence, we wonder what we can do about that. What practical steps could exist that might improve things?

This is a legit concern, imo. Why not to address the issue directly? I am not saying, that CO2 should be a taboo to mention, just let’s not to be distracted from the main topic too much.

2 Likes

Those are really good points!

A few other thoughts:

Defining where to “cut the tail” is always a hard problem - like for a lot of analysis I’ll precompute some datasets before running things and write them to disk, so they don’t run every time. How do you count those costs as a proportion of the whole analysis?

How do you count the fact that you do a bunch of exploratory runs before the main one? If they are small does it matter? Workflows can be really messy, how do you make sure that the significant parts are being tracked?

1 Like

And by the way, there is a proxy for resources spent on something. It is called money.

1 Like

This is very close to the definition of a straw man argument.

You are projecting what you experience as “every energy/environmental topic” onto this one, specific discussion we are having, as if we somehow represent all those people that you disagree with or agree with them on the things you disagree with. When you don’t know that at all.

Thanks! I’ll leave the list as is right now – hopefully I can manage coming out to Eindhoven and if not, at least we have this is a reference to revisit sometime!

And yea, I think another core discussion point I’ll add right now is the very broad question: “What constitutes a workflow?” – in the context of the benchmarking we care about. I think anything we’ll prototype will just start with a lot of assumptions to constrain the problem.

But hey, seems like a nice start!

What else can I do if this discussion turned out to be not much different from the million of others, starting right from the first reply? :slight_smile:

To get back on topic (sort of), I am involved in climate research for 20 years, which involves heavy HPC simulations on clusters with thousands of CPUs/GPUs, which consume HUGE amount of energy and produce a lot of heat and require complex cooling systems. The result is weather/short term climate forecasts, petabytes of data sets which are used for different research including climate change, and a bunch of publications. TBH, I am not aware of anybody in the field ever raising a concern about how energy efficient we are? How much energy do we consume per petabyte or per publication? How does this compare with research in other fields? Which raises a question, how much is this of a concern? Should we start to worry about our energy efficiency too? We don’t use much of Julia (unfortunately), but we use ML/AI in some models.

1 Like

But who pays your electricity bill?