In Binary Floating Point, you can multiply without multiplying

I discovered that In Binary Floating Point, you can multiply without multiplying.

1 Like

how are you planning on squaring without multiplication?

1 Like

It’s in the picture. Squaring can be done with just additions and change of exponents

Your “call this part recursively” is implementing schoolbook multiplication in binary.

13 Likes

Trick 1. Division by 4

Division is even slower than multiplication, slowest elementary math operation on CPU done in one instructions (some might have sqrt too, I think actually faster?, and sin etc. and arbitrary power, which is worse for just power of two).

You can shift by two for fast division, but not for floating point, or well multiply by 1/4…

To divide a binary floating point number by 4, you just divide a binary floating point number by 2 TWICE.

To divide a binary floating point number by 2, you leave the mantissa alone and reduce the exponent by 1. Really easy bit twiddling. Super fast.

3 Likes

Plus checking for underflow (or overflow for doublings).

You could use this


function DescribeFloat64(x::Float64;verbose=false)
    # Local function convert binary string 2 Int64
    cbs2Int64(s) = parse(Int64,"0b" * s)

    # Local function vprintln
    function vprintln(args...)
        if verbose == true
            println(args...)
        end
    end

    # Local ALTERNATIVE significand function
    function significand_alt(T::DataType,manArr::Array{Int64,1})
        # Check if all the bits in MantissaArray is 0
        if sum(manArr) == 0
            return convert(T,0)
        end
        setprecision(BigFloat, 52) do
            acc = BigFloat(0.0)
            multiplier = BigFloat(0.5)
            for k in manArr
                acc += k == 1 ? multiplier : BigFloat(0.0)
                multiplier /= BigFloat(2.0)
            end
            acc += BigFloat(1.0) 
            return convert(Float64,acc)
        end
    end

    # Local variables used in this function #
    local expoBiasValue = 1023
    local specialflags = false # Boolean flag
    local bs = bitstring(x)  # Character bit string
    local signbit = bs[1] == '1' ? 1 : 0 # Sign Bit Int64
    local expoView = @view bs[2:12] # a View of bit stream relevant to exponent
    # exponent Array{Int64,1}
    local expoArray = [ k == '1' ? 1 : 0 for k in expoView ]
    local expo  = cbs2Int64(expoView) # exponent Int64
    local mantissaView = @view bs[13:64] # a View of bit stream relevant to mantissa
    # mantissa Array{Int64,1}
    local mantissaArray = Int64[ k == '1' ? 1 : 0 for k in mantissaView ] 
    local mysignificand = string(significand(x)) # significand Float64
    # Strip the negative sign if second character is a number digit
    if mysignificand[1] == '-' &&  mysignificand[2] == '1' || mysignificand[2] == '0'
        mysignificand = lstrip(mysignificand,'-')
    end

    # List of special Boolean flags
    local INF_flag = false # Boolean flag
    local NaN_flag = false # Boolean flag
    local subnormal_flag = false
    # End of local variables

    println("------------------------------------------------------------")
    println("Float64 value : ",x)
    println("Raw Bitstring : ",bs,"â‚‚")
    println("======================")
    println("Sign Bit : ",bs[1],"â‚‚")
    println("Sign Value : ",signbit)

    println("======================")
    vprintln("Expo Array is ",expoArray)
    println("Exponent Bits : ",expoView,"â‚‚")  
    println("Exponent raw value : ",expo)
    println("Exponent normalised value : ",expo," - $(expoBiasValue) = ",expo - expoBiasValue)

    println("======================")
    vprintln("Typeof mantissa is ",typeof(mantissaArray))
    vprintln("mantissa Array is ",mantissaArray)
    println("Mantissa : I.",bs[13:64],"â‚‚")
    println("Significand : ",mysignificand)
 
    println("======================")
    ##############################################
    # Now check ALL conditions for special flags #
    ##############################################

    # Step 1: Checking for INF
    #     If all bits in mantissa are ZERO 
    #        and all bits in exponent are ONE
    #     then INF_flag is true
    if sum(mantissaArray) == 0 && prod(expoArray) == 1
        specialflags = true
        INF_flag = true
    end

    # Step 2: Checking for NaN
    #     If some bits in mantissa are NOT ZERO 
    #        and all bits in exponent are ONE
    #     then NaN_flag is true
    if sum(mantissaArray) > 0 && prod(expoArray) == 1
        specialflags = true
        NaN_flag = true
    end

    # Step 2: Checking for subnormal
    #     If some bits in mantissa are NOT ZERO 
    #        and all bits in exponent are ZERO
    #     then subnormal_flag is true
    if sum(mantissaArray) > 0 && sum(expoArray) == 0
        specialflags = true
        subnormal_flag = true
    end

    print("Value is ")

    if specialflags == false
      println("(-1)^",signbit," * ",mysignificand," * 2^",expo - 1023)
    else # specialflags == true
        if INF_flag
            print(signbit == 1 ? "Negative " : "Positive ")
            println("INF")
        end
        if NaN_flag
            println("NaN")
        end
        if subnormal_flag
            println("subnormal")
        end
    end

    println("Value is ",x)
    println("------------------------------------------------------------")
end

println("Testing 1.5"); DescribeFloat64(1.5); print()

#=
println("Testing -1.5"); DescribeFloat64(-1.5); print()
println("Testing 1.1"); DescribeFloat64(1.1); print()
println("Testing -1.1"); DescribeFloat64(-1.1); print()
println("Testing 6.02214076e23"); DescribeFloat64(6.02214076e23); print()
println("Testing 1.0/0.0 == +INF"); DescribeFloat64(1.0/0.0); print()
println("Testing -1.0/-0.0 == +INF"); DescribeFloat64(-1.0/-0.0); print()
println("Testing -1.0/0.0 == -INF"); DescribeFloat64(-1.0/0.0); print()
println("Testing 1.0/-0.0 == -INF"); DescribeFloat64(1.0/-0.0); print()
println("Testing 0.0/0.0 == NaN"); DescribeFloat64(0.0/0.0); print()
println("Testing 0.0 == normal Zero"); DescribeFloat64(0.0); print()
println("Testing -0.0 == sign Zero"); DescribeFloat64(-0.0); print()
println("Testing 5.0e-324 == subnormal"); DescribeFloat64(5.0e-324); print()
println("Testing floatmin(Float64) == 2.2250738585072014e-308"); DescribeFloat64(2.2250738585072014e-308); print()
=#

with output

Testing 1.5
------------------------------------------------------------
Float64 value : 1.5
Raw Bitstring : 0011111111111000000000000000000000000000000000000000000000000000â‚‚
======================
Sign Bit : 0â‚‚
Sign Value : 0
======================
Exponent Bits : 01111111111â‚‚
Exponent raw value : 1023
Exponent normalised value : 1023 - 1023 = 0
======================
Mantissa : I.1000000000000000000000000000000000000000000000000000â‚‚
Significand : 1.5
======================
Value is (-1)^0 * 1.5 * 2^0
Value is 1.5
------------------------------------------------------------

julia> 
1 Like

Minor details. Get the major details correct then spend hours on testing the edge cases.

ulia> DescribeFloat64(Ď€)
ERROR: MethodError: no method matching DescribeFloat64(::Irrational{:Ď€})
The function `DescribeFloat64` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  DescribeFloat64(::Float64; verbose)
   @ Main ~/juliascript/DescribeFloat64_simple.jl:2

Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> DescribeFloat64(Float64(Ď€))
------------------------------------------------------------
Float64 value : 3.141592653589793
Raw Bitstring : 0100000000001001001000011111101101010100010001000010110100011000â‚‚
======================
Sign Bit : 0â‚‚
Sign Value : 0
======================
Exponent Bits : 10000000000â‚‚
Exponent raw value : 1024
Exponent normalised value : 1024 - 1023 = 1
======================
Mantissa : I.1001001000011111101101010100010001000010110100011000â‚‚
Significand : 1.5707963267948966
======================
Value is (-1)^0 * 1.5707963267948966 * 2^1
Value is 3.141592653589793
------------------------------------------------------------

julia> 

I have updated the explanation

I found this when searching around in the internet.
Bit twidder’s paradise

Rob Wentworth says:

December 1, 2011 at 1:37 pm

Okay, if this thread is attracting “math heads” and “bit twiddlers”, perhaps these links are of interest:

The Aggregate Magic Algorithms:
http://aggregate.org/MAGIC/

Bit Twiddling Hacks:
http://graphics.stanford.edu/~seander/bithacks.html

And for something really hardcore CPU hackers, check out Agner Fog’s Optimization Manuals:

1 Like

And the total number of operations (shifts and additions) are actually larger than in the ordinary scheme of multiplication.

5 Likes

Still, it is the kind of work done to implement multiplication, though this would be more a topic for hardware than software today, typical processors are far ahead.

Not really an edge case once you step out of the bare-bones abstraction, it’s very easy to run into overflow 2.0^1024 for real data types and your algorithm needs to deal with that, preferably in compliance with IEEE-754 so mainstream architectures aren’t forced to emulate your floating points in software.

I agree completely but I just want to show that it can be done (not that it is efficient or practical).

1 Like

If a hardware is designed to do squaring very efficiently, then this algorithm is worthy.

Finally code to prove that it actually works

# global ALTERNATIVE significand function
function significand_alt(expoArr::Array{Int64,1},manArr::Array{Int64,1};implicitbitvalue=1.0)
    # Check if all the bits in MantissaArray is 0 and all bits in expoArr is 0
    if sum(manArr) == 0 && sum(expoArr) == 0
        return convert(Float64,0)
    end

    local expostr = String([ k == 0 ? '0' : '1'  for k in reverse(expoArr) ])
    local expoVal = parse(Int64,"0b" * expostr)

    if sum(manArr) == 0 && expoVal == 1023
        return convert(Float64,1)
    end
    setprecision(BigFloat, 52) do
        accumulator = BigFloat(0.0)
        multiplier = BigFloat(0.5)
        for k in reverse(manArr)
            accumulator += k == 1 ? multiplier : BigFloat(0.0)
            multiplier /= BigFloat(2.0)
        end
        accumulator += BigFloat(implicitbitvalue) 
        return convert(Float64,accumulator)
    end
end

function Float64toInt64BitsArray(x::Float64)
    return Int64[ k == '1' ? 1 : 0 for k in reverse(bitstring(x)) ]
end

function Int64BitsArraytoFloat64(ba::Array{Int64,1})
    local mysign = ba[64]
    local mysignificand = significand_alt(ba[1:52])
    expostr = String([ k == 0 ? '0' : '1'  for k in reverse(ba[53:63]) ])
    local myexpo = parse(Int64,"0b" * expostr) - 1023
    return (-1.0)^mysign * mysignificand * 2.0^myexpo 
end

function Float64toUInt64(x::Float64)
    local bs = bitstring(x)
    local myUINT64 = parse(UInt64,"0b" * s)
    return myUINT64
end

function GetSignBitofInt64BitsArray(ba)
    return ba[64]
end

function GetExponentofInt64BitsArray(ba)
    return ba[53:63]
end

function GetMantissaofInt64BitsArray(ba)
    return ba[1:52]
end

function StringExponent(expoarray)
    return String([k == 0 ? '0' : '1'  for k in reverse(expoarray)])
end

function StringMantissa(mantissaarray)
    return String([k == 0 ? '0' : '1'  for k in reverse(mantissaarray)])
end

function GetRawExpoValue(ExpoArray::Array{Int64,1})
    local expostr = String([ k == 0 ? '0' : '1'  for k in reverse(ExpoArray) ])
    return parse(Int64,"0b" * expostr)
end

function GetNormalExpoValue(ExpoArray::Array{Int64,1})
    return GetRawExpoValue(ExpoArray) - 1023
end

function SetNormalExpoValue(val::Int64)
    return SetRawExpoValue(val + 1023)
end

function SetRawExpoValue(val::Int64)
    local ExpoArr = Int64[ 0  for k = 1:11 ]
    local count = 1
    local x = val
    while x > 0
        if x % 2 == 1
            ExpoArr[count] = 1
        end
        count += 1
        x = div(x,2)
    end
    return ExpoArr
end

#=
    Get Everthing about Float 64 and return a tuple of
    1.  rawbitsarray
    2.  signbit
    3.  expobitsarray
    4.  mantissabitsarray
    5.  normalexpovalue
    6.  mysignificand
=#
function GetEverythingAboutFloat64(num::Float64)
    local rawbitsarray = Float64toInt64BitsArray(num)
    local signbit = GetSignBitofInt64BitsArray(rawbitsarray)
    local expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
    local mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
    local normalexpovalue = GetNormalExpoValue(expobitsarray)
    local mysignificand = significand_alt(expobitsarray,mantissabitsarray)
    return (rawbitsarray,signbit,expobitsarray,mantissabitsarray,
            normalexpovalue,mysignificand)
end


function square(x::Float64)
    # The square of any real number is positive
    # so we might as well treat the input as a positive number
    local remainder = abs(x)
    local result = 0.0
    while remainder > 0.0
        local mytuple = GetEverythingAboutFloat64(remainder)
        local n = mytuple[5]
        result += 2.0 ^ (2*n)
        # calc new remainder from old remainder
        remainder = remainder - (2.0^n)
        if remainder > 0.0
            # result = result + 2 * 2^n * remainder
            local newtuple = GetEverythingAboutFloat64(remainder)
            local newsignificand = newtuple[6]
            local newexpo = newtuple[5]
            result += newsignificand * 2.0^(newexpo+n+1)
        end
    end
    return result
end

function divbyfour(x)
    local mytuple = GetEverythingAboutFloat64(x)
    local signbit = mytuple[2]
    local mysignificand = mytuple[6]
    local n = mytuple[5]
    return (-1.0)^signbit * mysignificand * 2.0^(n-2)
end

function mymul(A,B)
    return divbyfour(square(A+B) - square(A-B))    
end

num = 1.5
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"â‚‚")
println("Sign bit is ",signbit,"â‚‚")
println("Expo string is ",StringExponent(expobitsarray),"â‚‚")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"â‚‚")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()



num = Float64(Ď€)
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"â‚‚")
println("Sign bit is ",signbit,"â‚‚")
println("Expo string is ",StringExponent(expobitsarray),"â‚‚")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"â‚‚")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()

println("Now test mymul")
println("sqrt(2.0) * π               = ",sqrt(2.0) * Float64(π))
println("mymul(sqrt(2.0),Float64(Ď€)) = ",mymul(sqrt(2.0),Float64(Ď€)))

with output

Float64 num is 1.5
Rawbits is 0011111111111000000000000000000000000000000000000000000000000000â‚‚
Sign bit is 0â‚‚
Expo string is 01111111111â‚‚
Mantissa string is 1000000000000000000000000000000000000000000000000000â‚‚

converting bitsarray back to Float64 : 1.5

Float64 num is 3.141592653589793
Rawbits is 0100000000001001001000011111101101010100010001000010110100011000â‚‚
Sign bit is 0â‚‚
Expo string is 10000000000â‚‚
Mantissa string is 1001001000011111101101010100010001000010110100011000â‚‚

converting bitsarray back to Float64 : 3.141592653589793

Now test mymul
sqrt(2.0) * π               = 4.442882938158366
mymul(sqrt(2.0),Float64(Ď€)) = 4.442882938158366

All right I admit it. I am obsess with the proof of concept of a binary floating point multiplication without multiplying.

I don’t really want to spend hours getting all the edge cases to work. I just want to see it work for the most general cases. I doubt that I want to waste any more time getting INF,NaNs and subnormals working properly.

Once I got the general cases to work, the “fun” element is gone. After all the function is inefficient and non-practical for general floating point multiplication.

The thrill is just getting it to work. Aka proof of concept.

1 Like

Semi-related but I ran across a paper that approximated away the mantissa multiplication, effectively eliminating a looping or recursive step (Addition is All You Need for Energy-Efficient Language Models, Hongyin Luo & Wei Sun). Short version is (1+m1)*2^e1 * (1+m2)*2^e2 (m for mantissa<1, e for exponent) going from

(1+m1+m2+m1*m2)*2^(e1+e2)

to

(1+m1+m2+2^-3 )*2^(e1+e2)

which introduces an error for real numbers, but this error can be acceptable or negligible in applications that use low precision floating point. Float8s.jl doesn’t cover all the 8-bit types they examined, and I won’t implement anything to corroborate their findings, but it seems plausible.

2 Likes

haven’t had time to read their paper, but your introduction just made me clear :wink: