In Binary Floating Point, you can multiply without multiplying

StevenSiew · October 12, 2024, 12:15pm

I discovered that In Binary Floating Point, you can multiply without multiplying.

Oscar_Smith · October 12, 2024, 12:58pm

how are you planning on squaring without multiplication?

screw_dog · October 12, 2024, 12:59pm

It’s in the picture. Squaring can be done with just additions and change of exponents

stevengj · October 12, 2024, 1:17pm

Your “call this part recursively” is implementing schoolbook multiplication in binary.

Palli · October 12, 2024, 7:43pm

Trick 1. Division by 4

Division is even slower than multiplication, slowest elementary math operation on CPU done in one instructions (some might have sqrt too, I think actually faster?, and sin etc. and arbitrary power, which is worse for just power of two).

You can shift by two for fast division, but not for floating point, or well multiply by 1/4…

StevenSiew · October 13, 2024, 2:11am

To divide a binary floating point number by 4, you just divide a binary floating point number by 2 TWICE.

To divide a binary floating point number by 2, you leave the mantissa alone and reduce the exponent by 1. Really easy bit twiddling. Super fast.

Benny · October 13, 2024, 3:23am

Plus checking for underflow (or overflow for doublings).

StevenSiew · October 13, 2024, 4:59am

You could use this


function DescribeFloat64(x::Float64;verbose=false)
    # Local function convert binary string 2 Int64
    cbs2Int64(s) = parse(Int64,"0b" * s)

    # Local function vprintln
    function vprintln(args...)
        if verbose == true
            println(args...)
        end
    end

    # Local ALTERNATIVE significand function
    function significand_alt(T::DataType,manArr::Array{Int64,1})
        # Check if all the bits in MantissaArray is 0
        if sum(manArr) == 0
            return convert(T,0)
        end
        setprecision(BigFloat, 52) do
            acc = BigFloat(0.0)
            multiplier = BigFloat(0.5)
            for k in manArr
                acc += k == 1 ? multiplier : BigFloat(0.0)
                multiplier /= BigFloat(2.0)
            end
            acc += BigFloat(1.0) 
            return convert(Float64,acc)
        end
    end

    # Local variables used in this function #
    local expoBiasValue = 1023
    local specialflags = false # Boolean flag
    local bs = bitstring(x)  # Character bit string
    local signbit = bs[1] == '1' ? 1 : 0 # Sign Bit Int64
    local expoView = @view bs[2:12] # a View of bit stream relevant to exponent
    # exponent Array{Int64,1}
    local expoArray = [ k == '1' ? 1 : 0 for k in expoView ]
    local expo  = cbs2Int64(expoView) # exponent Int64
    local mantissaView = @view bs[13:64] # a View of bit stream relevant to mantissa
    # mantissa Array{Int64,1}
    local mantissaArray = Int64[ k == '1' ? 1 : 0 for k in mantissaView ] 
    local mysignificand = string(significand(x)) # significand Float64
    # Strip the negative sign if second character is a number digit
    if mysignificand[1] == '-' &&  mysignificand[2] == '1' || mysignificand[2] == '0'
        mysignificand = lstrip(mysignificand,'-')
    end

    # List of special Boolean flags
    local INF_flag = false # Boolean flag
    local NaN_flag = false # Boolean flag
    local subnormal_flag = false
    # End of local variables

    println("------------------------------------------------------------")
    println("Float64 value : ",x)
    println("Raw Bitstring : ",bs,"₂")
    println("======================")
    println("Sign Bit : ",bs[1],"₂")
    println("Sign Value : ",signbit)

    println("======================")
    vprintln("Expo Array is ",expoArray)
    println("Exponent Bits : ",expoView,"₂")  
    println("Exponent raw value : ",expo)
    println("Exponent normalised value : ",expo," - $(expoBiasValue) = ",expo - expoBiasValue)

    println("======================")
    vprintln("Typeof mantissa is ",typeof(mantissaArray))
    vprintln("mantissa Array is ",mantissaArray)
    println("Mantissa : I.",bs[13:64],"₂")
    println("Significand : ",mysignificand)
 
    println("======================")
    ##############################################
    # Now check ALL conditions for special flags #
    ##############################################

    # Step 1: Checking for INF
    #     If all bits in mantissa are ZERO 
    #        and all bits in exponent are ONE
    #     then INF_flag is true
    if sum(mantissaArray) == 0 && prod(expoArray) == 1
        specialflags = true
        INF_flag = true
    end

    # Step 2: Checking for NaN
    #     If some bits in mantissa are NOT ZERO 
    #        and all bits in exponent are ONE
    #     then NaN_flag is true
    if sum(mantissaArray) > 0 && prod(expoArray) == 1
        specialflags = true
        NaN_flag = true
    end

    # Step 2: Checking for subnormal
    #     If some bits in mantissa are NOT ZERO 
    #        and all bits in exponent are ZERO
    #     then subnormal_flag is true
    if sum(mantissaArray) > 0 && sum(expoArray) == 0
        specialflags = true
        subnormal_flag = true
    end

    print("Value is ")

    if specialflags == false
      println("(-1)^",signbit," * ",mysignificand," * 2^",expo - 1023)
    else # specialflags == true
        if INF_flag
            print(signbit == 1 ? "Negative " : "Positive ")
            println("INF")
        end
        if NaN_flag
            println("NaN")
        end
        if subnormal_flag
            println("subnormal")
        end
    end

    println("Value is ",x)
    println("------------------------------------------------------------")
end

println("Testing 1.5"); DescribeFloat64(1.5); print()

#=
println("Testing -1.5"); DescribeFloat64(-1.5); print()
println("Testing 1.1"); DescribeFloat64(1.1); print()
println("Testing -1.1"); DescribeFloat64(-1.1); print()
println("Testing 6.02214076e23"); DescribeFloat64(6.02214076e23); print()
println("Testing 1.0/0.0 == +INF"); DescribeFloat64(1.0/0.0); print()
println("Testing -1.0/-0.0 == +INF"); DescribeFloat64(-1.0/-0.0); print()
println("Testing -1.0/0.0 == -INF"); DescribeFloat64(-1.0/0.0); print()
println("Testing 1.0/-0.0 == -INF"); DescribeFloat64(1.0/-0.0); print()
println("Testing 0.0/0.0 == NaN"); DescribeFloat64(0.0/0.0); print()
println("Testing 0.0 == normal Zero"); DescribeFloat64(0.0); print()
println("Testing -0.0 == sign Zero"); DescribeFloat64(-0.0); print()
println("Testing 5.0e-324 == subnormal"); DescribeFloat64(5.0e-324); print()
println("Testing floatmin(Float64) == 2.2250738585072014e-308"); DescribeFloat64(2.2250738585072014e-308); print()
=#

with output

Testing 1.5
------------------------------------------------------------
Float64 value : 1.5
Raw Bitstring : 0011111111111000000000000000000000000000000000000000000000000000₂
======================
Sign Bit : 0₂
Sign Value : 0
======================
Exponent Bits : 01111111111₂
Exponent raw value : 1023
Exponent normalised value : 1023 - 1023 = 0
======================
Mantissa : I.1000000000000000000000000000000000000000000000000000₂
Significand : 1.5
======================
Value is (-1)^0 * 1.5 * 2^0
Value is 1.5
------------------------------------------------------------

julia>

StevenSiew · October 13, 2024, 5:02am

Minor details. Get the major details correct then spend hours on testing the edge cases.

StevenSiew · October 13, 2024, 5:04am

ulia> DescribeFloat64(π)
ERROR: MethodError: no method matching DescribeFloat64(::Irrational{:π})
The function `DescribeFloat64` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  DescribeFloat64(::Float64; verbose)
   @ Main ~/juliascript/DescribeFloat64_simple.jl:2

Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> DescribeFloat64(Float64(π))
------------------------------------------------------------
Float64 value : 3.141592653589793
Raw Bitstring : 0100000000001001001000011111101101010100010001000010110100011000₂
======================
Sign Bit : 0₂
Sign Value : 0
======================
Exponent Bits : 10000000000₂
Exponent raw value : 1024
Exponent normalised value : 1024 - 1023 = 1
======================
Mantissa : I.1001001000011111101101010100010001000010110100011000₂
Significand : 1.5707963267948966
======================
Value is (-1)^0 * 1.5707963267948966 * 2^1
Value is 3.141592653589793
------------------------------------------------------------

julia>

StevenSiew · October 13, 2024, 5:13am

I have updated the explanation

StevenSiew · October 13, 2024, 5:36am

I found this when searching around in the internet.
Bit twidder’s paradise

Rob Wentworth says:

December 1, 2011 at 1:37 pm

Okay, if this thread is attracting “math heads” and “bit twiddlers”, perhaps these links are of interest:

The Aggregate Magic Algorithms:
http://aggregate.org/MAGIC/

Bit Twiddling Hacks:
http://graphics.stanford.edu/~seander/bithacks.html

And for something really hardcore CPU hackers, check out Agner Fog’s Optimization Manuals:

photor · October 13, 2024, 5:41am

And the total number of operations (shifts and additions) are actually larger than in the ordinary scheme of multiplication.

Benny · October 13, 2024, 6:20am

Still, it is the kind of work done to implement multiplication, though this would be more a topic for hardware than software today, typical processors are far ahead.

Not really an edge case once you step out of the bare-bones abstraction, it’s very easy to run into overflow 2.0^1024 for real data types and your algorithm needs to deal with that, preferably in compliance with IEEE-754 so mainstream architectures aren’t forced to emulate your floating points in software.

StevenSiew · October 13, 2024, 10:13am

I agree completely but I just want to show that it can be done (not that it is efficient or practical).

photor · October 13, 2024, 11:55am

If a hardware is designed to do squaring very efficiently, then this algorithm is worthy.

StevenSiew · October 14, 2024, 3:15am

Finally code to prove that it actually works

# global ALTERNATIVE significand function
function significand_alt(expoArr::Array{Int64,1},manArr::Array{Int64,1};implicitbitvalue=1.0)
    # Check if all the bits in MantissaArray is 0 and all bits in expoArr is 0
    if sum(manArr) == 0 && sum(expoArr) == 0
        return convert(Float64,0)
    end

    local expostr = String([ k == 0 ? '0' : '1'  for k in reverse(expoArr) ])
    local expoVal = parse(Int64,"0b" * expostr)

    if sum(manArr) == 0 && expoVal == 1023
        return convert(Float64,1)
    end
    setprecision(BigFloat, 52) do
        accumulator = BigFloat(0.0)
        multiplier = BigFloat(0.5)
        for k in reverse(manArr)
            accumulator += k == 1 ? multiplier : BigFloat(0.0)
            multiplier /= BigFloat(2.0)
        end
        accumulator += BigFloat(implicitbitvalue) 
        return convert(Float64,accumulator)
    end
end

function Float64toInt64BitsArray(x::Float64)
    return Int64[ k == '1' ? 1 : 0 for k in reverse(bitstring(x)) ]
end

function Int64BitsArraytoFloat64(ba::Array{Int64,1})
    local mysign = ba[64]
    local mysignificand = significand_alt(ba[1:52])
    expostr = String([ k == 0 ? '0' : '1'  for k in reverse(ba[53:63]) ])
    local myexpo = parse(Int64,"0b" * expostr) - 1023
    return (-1.0)^mysign * mysignificand * 2.0^myexpo 
end

function Float64toUInt64(x::Float64)
    local bs = bitstring(x)
    local myUINT64 = parse(UInt64,"0b" * s)
    return myUINT64
end

function GetSignBitofInt64BitsArray(ba)
    return ba[64]
end

function GetExponentofInt64BitsArray(ba)
    return ba[53:63]
end

function GetMantissaofInt64BitsArray(ba)
    return ba[1:52]
end

function StringExponent(expoarray)
    return String([k == 0 ? '0' : '1'  for k in reverse(expoarray)])
end

function StringMantissa(mantissaarray)
    return String([k == 0 ? '0' : '1'  for k in reverse(mantissaarray)])
end

function GetRawExpoValue(ExpoArray::Array{Int64,1})
    local expostr = String([ k == 0 ? '0' : '1'  for k in reverse(ExpoArray) ])
    return parse(Int64,"0b" * expostr)
end

function GetNormalExpoValue(ExpoArray::Array{Int64,1})
    return GetRawExpoValue(ExpoArray) - 1023
end

function SetNormalExpoValue(val::Int64)
    return SetRawExpoValue(val + 1023)
end

function SetRawExpoValue(val::Int64)
    local ExpoArr = Int64[ 0  for k = 1:11 ]
    local count = 1
    local x = val
    while x > 0
        if x % 2 == 1
            ExpoArr[count] = 1
        end
        count += 1
        x = div(x,2)
    end
    return ExpoArr
end

#=
    Get Everthing about Float 64 and return a tuple of
    1.  rawbitsarray
    2.  signbit
    3.  expobitsarray
    4.  mantissabitsarray
    5.  normalexpovalue
    6.  mysignificand
=#
function GetEverythingAboutFloat64(num::Float64)
    local rawbitsarray = Float64toInt64BitsArray(num)
    local signbit = GetSignBitofInt64BitsArray(rawbitsarray)
    local expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
    local mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
    local normalexpovalue = GetNormalExpoValue(expobitsarray)
    local mysignificand = significand_alt(expobitsarray,mantissabitsarray)
    return (rawbitsarray,signbit,expobitsarray,mantissabitsarray,
            normalexpovalue,mysignificand)
end


function square(x::Float64)
    # The square of any real number is positive
    # so we might as well treat the input as a positive number
    local remainder = abs(x)
    local result = 0.0
    while remainder > 0.0
        local mytuple = GetEverythingAboutFloat64(remainder)
        local n = mytuple[5]
        result += 2.0 ^ (2*n)
        # calc new remainder from old remainder
        remainder = remainder - (2.0^n)
        if remainder > 0.0
            # result = result + 2 * 2^n * remainder
            local newtuple = GetEverythingAboutFloat64(remainder)
            local newsignificand = newtuple[6]
            local newexpo = newtuple[5]
            result += newsignificand * 2.0^(newexpo+n+1)
        end
    end
    return result
end

function divbyfour(x)
    local mytuple = GetEverythingAboutFloat64(x)
    local signbit = mytuple[2]
    local mysignificand = mytuple[6]
    local n = mytuple[5]
    return (-1.0)^signbit * mysignificand * 2.0^(n-2)
end

function mymul(A,B)
    return divbyfour(square(A+B) - square(A-B))    
end

num = 1.5
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"₂")
println("Sign bit is ",signbit,"₂")
println("Expo string is ",StringExponent(expobitsarray),"₂")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"₂")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()



num = Float64(π)
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"₂")
println("Sign bit is ",signbit,"₂")
println("Expo string is ",StringExponent(expobitsarray),"₂")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"₂")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()

println("Now test mymul")
println("sqrt(2.0) * π               = ",sqrt(2.0) * Float64(π))
println("mymul(sqrt(2.0),Float64(π)) = ",mymul(sqrt(2.0),Float64(π)))

with output

Float64 num is 1.5
Rawbits is 0011111111111000000000000000000000000000000000000000000000000000₂
Sign bit is 0₂
Expo string is 01111111111₂
Mantissa string is 1000000000000000000000000000000000000000000000000000₂

converting bitsarray back to Float64 : 1.5

Float64 num is 3.141592653589793
Rawbits is 0100000000001001001000011111101101010100010001000010110100011000₂
Sign bit is 0₂
Expo string is 10000000000₂
Mantissa string is 1001001000011111101101010100010001000010110100011000₂

converting bitsarray back to Float64 : 3.141592653589793

Now test mymul
sqrt(2.0) * π               = 4.442882938158366
mymul(sqrt(2.0),Float64(π)) = 4.442882938158366

StevenSiew · October 14, 2024, 4:03am

All right I admit it. I am obsess with the proof of concept of a binary floating point multiplication without multiplying.

I don’t really want to spend hours getting all the edge cases to work. I just want to see it work for the most general cases. I doubt that I want to waste any more time getting INF,NaNs and subnormals working properly.

Once I got the general cases to work, the “fun” element is gone. After all the function is inefficient and non-practical for general floating point multiplication.

The thrill is just getting it to work. Aka proof of concept.

Benny · October 14, 2024, 10:56am

Semi-related but I ran across a paper that approximated away the mantissa multiplication, effectively eliminating a looping or recursive step (Addition is All You Need for Energy-Efficient Language Models, Hongyin Luo & Wei Sun). Short version is (1+m1)*2^e1 * (1+m2)*2^e2 (m for mantissa<1, e for exponent) going from

(1+m1+m2+m1*m2)*2^(e1+e2)

to

(1+m1+m2+2^-3 )*2^(e1+e2)

which introduces an error for real numbers, but this error can be acceptable or negligible in applications that use low precision floating point. Float8s.jl doesn’t cover all the 8-bit types they examined, and I won’t implement anything to corroborate their findings, but it seems plausible.

photor · October 14, 2024, 1:31pm

haven’t had time to read their paper, but your introduction just made me clear

Topic		Replies	Views
Fast division by 2 Numerics	13	430	November 3, 2024
Large numbers multiplication New to Julia question	2	176	July 3, 2024
A question regarding BigFloat General Usage question	11	1803	November 5, 2018
AbstractFloat math (Is anybody already doing this?) Numerics	4	1286	June 20, 2017
I will never understand floating-point arithmetic Offtopic float , precision	18	956	March 8, 2024

In Binary Floating Point, you can multiply without multiplying

Related topics