That’s interesting to put computer activities in context, but I don’t think comparisons with unrelated domains make sense when considering the energy consumption of Julia specifically… It’s an excuse that can always be used, there’s always something else consuming more energy.
I agree when considering my general choice of an activity, 1% is not that much. But if I can save 0.8% of my energy consumption just by switching language, and still accomplish my goals, that’s also very relevant! I think that’s the core question of this thread…
To get back to the original topic here (which was a nice and positive message, thanks )
I think it will be extremely difficult to quantitatively show that Julia saves energy. Things to consider:
Julia should be used “in production” as part of a large, potentially ongoing computation or a very widely deployed application. These are the cases where a significant amount of energy is actually used.
It should replace the fixed amount of work that another system was doing and do it more efficiently or better.
The hard part is that many systems don’t have a fixed amount of computational work to do. It seems we can always consume as much compute as possible given hardware and software improvements over many years. The actual limitation on energy use from computing seems more economic than computational.
For example HPC workloads do consume a lot of energy, but a lot of scientific workloads have the property that the simulation just expands to fill the compute available. Larger meshes, higher resolution, more samples. So the amount of energy used in these situations where Julia is great may be somewhat independent of the language, and more dependent on the budget to buy hardware and compute time. There’s better scientific return with the ability to do higher resolutions or whatever. But I think making a clear comparison is really difficult!
Crypto is another good example of where the practical energy usage of the system has nothing to do with computational efficiency of the components. Better efficiency just means the network adjusts to make hashing more difficult and the same total amount of energy is used. When bitcoin went to GPUs and then ASICs nothing really changed energy-wise. By design!
This has been discussed here some times. Although I agree mostly with the points of @Chris_Foster that it is not clear if these measures actually mean anything for the real-world use of energy in computing, there are these two studies:
In this one Julia is not initially present, but somewhere there one can find a link to an updated version in which Julia is included, and has a fair position:
And there is this Nature Astronomy commentary, which includes Julia:
Perhaps that’s not what this post is about, but it seems worthwhile to mention that Julia is very much used in green tech, from climate modelling to energy grid optimization. We’re using it as well in an industrial setting to reduce our client’s energy consumption. Those initiatives can move the needle a lot, even when the algorithms themselves take a lot of CPU time to run…
> 100 minutes is common for the primary packages I work on.
Now, I’m suddenly concerned about the amount of energy I’m apparently using.
But I suspect they’re using much less than 100W, given that CI is restricted to a single core (2 threads).
This is great. Effectively a case where Julia is being used to make a fixed amount of work more efficient. And if it’s not computational work but an industrial process (or whatever), the amount of energy saved could be huge. (Assuming that your client has a fixed amount of work and they don’t scale up their operations until they’re using the same amount of energy again?)
2 ideas, where I think Julia has a strong potential to yield massive energy savings, if researchers extend Julia’s already excellent solvers/automatic differentiation and extend current research being done implemented in other languages, would be:
Gradient estimation or more efficient gradient-free optimization/tuning of discrete simulators
Where I work we have (a thousand? more?) cloud VMs running nearly every day, running long (e.g. 6-CPU-hour per iteration) simulations of CPUs and GPUs in development. It’s not my day job, but I’ve been on the look-out for innovations (papers, etc) that would allow us to take these discontinuous/discrete simulations (often done using grid-search or primitive black-box-optimization) and somehow speed up the search, ideally using gradient estimation techniques so SGDs would “just work”. If anybody is aware of recent breakthroughs in this area (especially one Julia-based), I think in relatively little time I could rally some engineers in the company and rewrite some of the simulators from C++ to Julia to take advantage of it.
One thing which will save energy is choosing the correct type for calculations.
Look at the machine learning community with use of FLoat32 and the recent specific lower precision types from Nvidia.
I read a fantastic paper years ago on the topic of watts per instruction and choosing types - sadly I did not keep the reference. If anyone has it please let me know.
Yeah, for sure, Int types (especially low-precision like Int4) are crazy cheap in area, power usage, etc. compared to Float16 or Float32. And at the extreme, binarizing a neural network can achieve an extra order of magnitude power and perf saving since multiplications get converted into adds, etc.