Not good: AI is a problem, not a solution

My point is that the usage of MFLOP is completely arbitrary. Your argument would not work for GFLOP. So what is special about MFLOP? If there is no reason why that unit is special, then you can find a “square-root relationship” for any 2 numbers…

1 Like

Did you notice that MFLOP is in non-dimensional units? If I switch to GFLOP, I use C*2^10, but I get the same energetic cost…

Yes so there is an arbitrary factor isn’t it? With this factor you can find a “square-root” relationship for any pair of numbers

2 Likes

No, it is not arbitrary. C is measured (presumably). It works like this:

julia> C = 0.3; # J / GFLOP

julia> M = 5000; # GFLOP

julia> train = M*10^9 * C/10^9
1.50000e+03

julia> query = sqrt(M*10^9) * C/10^9
6.70820e-04

julia> 

julia> C = 0.3 / 1000; # J / MFLOP

julia> M = 5000*1000; # MFLOP

julia> train = M*10^6 * C/10^6
1.50000e+03

julia> query = sqrt(M*10^6) * C/10^6
6.70820e-04

julia> 

All needs to be brought to a common base unit, FLOP.

Edit: @abraemer You are right, as I wrote it above (Not good: AI is a problem, not a solution - #20 by PetrKryslUCSD) it wouldn’t work.

I suppose you underestimate here the amount of data considering only the relative increase instead of the entire amount. It is as if you would compare the time of training on Gigabytes of data with time needed to incorporate some more bytes of additional data into a ready to use model.

Yes, that’s basically what I said: you cannot sqrt energy.

Not quite: number of operations is just a number without units, and one can sqrt them. This not an arbitrary units, but a concrete number.

2 Likes

I don’t agree: “cycles” is a unit - it has just the same dimension as a number. I could totally define “kilo cycles = 1000 cycles” and have the conceptual problem. Or if you decide to count something like muladds instead of FLOP

2 Likes

You could also define “kilo ones = 1000 ones”, and then sqrt(100) = 10 but sqrt(0.1 kilos) != 0.01 kilos (:

Technically yes, but the result is basically the same no matter if you choose basic arithmetic or muladds. The statement isn’t “inference #ops = sqrt(training #ops)” exactly, it’s just an order of magnitude.

As shown here, the decisive quantity is the number of FLOP. One can certainly take a square root of a ND number. The total of FLOP is then multiplied with a suitable constant converting FLOP to energy.