I discovered that In Binary Floating Point, you can multiply without multiplying.
how are you planning on squaring without multiplication?
It’s in the picture. Squaring can be done with just additions and change of exponents
Your “call this part recursively” is implementing schoolbook multiplication in binary.
Trick 1. Division by 4
Division is even slower than multiplication, slowest elementary math operation on CPU done in one instructions (some might have sqrt too, I think actually faster?, and sin etc. and arbitrary power, which is worse for just power of two).
You can shift by two for fast division, but not for floating point, or well multiply by 1/4…
To divide a binary floating point number by 4, you just divide a binary floating point number by 2 TWICE.
To divide a binary floating point number by 2, you leave the mantissa alone and reduce the exponent by 1. Really easy bit twiddling. Super fast.
Plus checking for underflow (or overflow for doublings).
You could use this
function DescribeFloat64(x::Float64;verbose=false)
# Local function convert binary string 2 Int64
cbs2Int64(s) = parse(Int64,"0b" * s)
# Local function vprintln
function vprintln(args...)
if verbose == true
println(args...)
end
end
# Local ALTERNATIVE significand function
function significand_alt(T::DataType,manArr::Array{Int64,1})
# Check if all the bits in MantissaArray is 0
if sum(manArr) == 0
return convert(T,0)
end
setprecision(BigFloat, 52) do
acc = BigFloat(0.0)
multiplier = BigFloat(0.5)
for k in manArr
acc += k == 1 ? multiplier : BigFloat(0.0)
multiplier /= BigFloat(2.0)
end
acc += BigFloat(1.0)
return convert(Float64,acc)
end
end
# Local variables used in this function #
local expoBiasValue = 1023
local specialflags = false # Boolean flag
local bs = bitstring(x) # Character bit string
local signbit = bs[1] == '1' ? 1 : 0 # Sign Bit Int64
local expoView = @view bs[2:12] # a View of bit stream relevant to exponent
# exponent Array{Int64,1}
local expoArray = [ k == '1' ? 1 : 0 for k in expoView ]
local expo = cbs2Int64(expoView) # exponent Int64
local mantissaView = @view bs[13:64] # a View of bit stream relevant to mantissa
# mantissa Array{Int64,1}
local mantissaArray = Int64[ k == '1' ? 1 : 0 for k in mantissaView ]
local mysignificand = string(significand(x)) # significand Float64
# Strip the negative sign if second character is a number digit
if mysignificand[1] == '-' && mysignificand[2] == '1' || mysignificand[2] == '0'
mysignificand = lstrip(mysignificand,'-')
end
# List of special Boolean flags
local INF_flag = false # Boolean flag
local NaN_flag = false # Boolean flag
local subnormal_flag = false
# End of local variables
println("------------------------------------------------------------")
println("Float64 value : ",x)
println("Raw Bitstring : ",bs,"â‚‚")
println("======================")
println("Sign Bit : ",bs[1],"â‚‚")
println("Sign Value : ",signbit)
println("======================")
vprintln("Expo Array is ",expoArray)
println("Exponent Bits : ",expoView,"â‚‚")
println("Exponent raw value : ",expo)
println("Exponent normalised value : ",expo," - $(expoBiasValue) = ",expo - expoBiasValue)
println("======================")
vprintln("Typeof mantissa is ",typeof(mantissaArray))
vprintln("mantissa Array is ",mantissaArray)
println("Mantissa : I.",bs[13:64],"â‚‚")
println("Significand : ",mysignificand)
println("======================")
##############################################
# Now check ALL conditions for special flags #
##############################################
# Step 1: Checking for INF
# If all bits in mantissa are ZERO
# and all bits in exponent are ONE
# then INF_flag is true
if sum(mantissaArray) == 0 && prod(expoArray) == 1
specialflags = true
INF_flag = true
end
# Step 2: Checking for NaN
# If some bits in mantissa are NOT ZERO
# and all bits in exponent are ONE
# then NaN_flag is true
if sum(mantissaArray) > 0 && prod(expoArray) == 1
specialflags = true
NaN_flag = true
end
# Step 2: Checking for subnormal
# If some bits in mantissa are NOT ZERO
# and all bits in exponent are ZERO
# then subnormal_flag is true
if sum(mantissaArray) > 0 && sum(expoArray) == 0
specialflags = true
subnormal_flag = true
end
print("Value is ")
if specialflags == false
println("(-1)^",signbit," * ",mysignificand," * 2^",expo - 1023)
else # specialflags == true
if INF_flag
print(signbit == 1 ? "Negative " : "Positive ")
println("INF")
end
if NaN_flag
println("NaN")
end
if subnormal_flag
println("subnormal")
end
end
println("Value is ",x)
println("------------------------------------------------------------")
end
println("Testing 1.5"); DescribeFloat64(1.5); print()
#=
println("Testing -1.5"); DescribeFloat64(-1.5); print()
println("Testing 1.1"); DescribeFloat64(1.1); print()
println("Testing -1.1"); DescribeFloat64(-1.1); print()
println("Testing 6.02214076e23"); DescribeFloat64(6.02214076e23); print()
println("Testing 1.0/0.0 == +INF"); DescribeFloat64(1.0/0.0); print()
println("Testing -1.0/-0.0 == +INF"); DescribeFloat64(-1.0/-0.0); print()
println("Testing -1.0/0.0 == -INF"); DescribeFloat64(-1.0/0.0); print()
println("Testing 1.0/-0.0 == -INF"); DescribeFloat64(1.0/-0.0); print()
println("Testing 0.0/0.0 == NaN"); DescribeFloat64(0.0/0.0); print()
println("Testing 0.0 == normal Zero"); DescribeFloat64(0.0); print()
println("Testing -0.0 == sign Zero"); DescribeFloat64(-0.0); print()
println("Testing 5.0e-324 == subnormal"); DescribeFloat64(5.0e-324); print()
println("Testing floatmin(Float64) == 2.2250738585072014e-308"); DescribeFloat64(2.2250738585072014e-308); print()
=#
with output
Testing 1.5
------------------------------------------------------------
Float64 value : 1.5
Raw Bitstring : 0011111111111000000000000000000000000000000000000000000000000000â‚‚
======================
Sign Bit : 0â‚‚
Sign Value : 0
======================
Exponent Bits : 01111111111â‚‚
Exponent raw value : 1023
Exponent normalised value : 1023 - 1023 = 0
======================
Mantissa : I.1000000000000000000000000000000000000000000000000000â‚‚
Significand : 1.5
======================
Value is (-1)^0 * 1.5 * 2^0
Value is 1.5
------------------------------------------------------------
julia>
Minor details. Get the major details correct then spend hours on testing the edge cases.
ulia> DescribeFloat64(Ď€)
ERROR: MethodError: no method matching DescribeFloat64(::Irrational{:Ď€})
The function `DescribeFloat64` exists, but no method is defined for this combination of argument types.
Closest candidates are:
DescribeFloat64(::Float64; verbose)
@ Main ~/juliascript/DescribeFloat64_simple.jl:2
Stacktrace:
[1] top-level scope
@ REPL[1]:1
julia> DescribeFloat64(Float64(Ď€))
------------------------------------------------------------
Float64 value : 3.141592653589793
Raw Bitstring : 0100000000001001001000011111101101010100010001000010110100011000â‚‚
======================
Sign Bit : 0â‚‚
Sign Value : 0
======================
Exponent Bits : 10000000000â‚‚
Exponent raw value : 1024
Exponent normalised value : 1024 - 1023 = 1
======================
Mantissa : I.1001001000011111101101010100010001000010110100011000â‚‚
Significand : 1.5707963267948966
======================
Value is (-1)^0 * 1.5707963267948966 * 2^1
Value is 3.141592653589793
------------------------------------------------------------
julia>
I found this when searching around in the internet.
Bit twidder’s paradise
Rob Wentworth says:
Okay, if this thread is attracting “math heads” and “bit twiddlers”, perhaps these links are of interest:
The Aggregate Magic Algorithms:
http://aggregate.org/MAGIC/
Bit Twiddling Hacks:
http://graphics.stanford.edu/~seander/bithacks.html
And for something really hardcore CPU hackers, check out Agner Fog’s Optimization Manuals:
And the total number of operations (shifts and additions) are actually larger than in the ordinary scheme of multiplication.
Still, it is the kind of work done to implement multiplication, though this would be more a topic for hardware than software today, typical processors are far ahead.
Not really an edge case once you step out of the bare-bones abstraction, it’s very easy to run into overflow 2.0^1024
for real data types and your algorithm needs to deal with that, preferably in compliance with IEEE-754 so mainstream architectures aren’t forced to emulate your floating points in software.
I agree completely but I just want to show that it can be done (not that it is efficient or practical).
If a hardware is designed to do squaring very efficiently, then this algorithm is worthy.
Finally code to prove that it actually works
# global ALTERNATIVE significand function
function significand_alt(expoArr::Array{Int64,1},manArr::Array{Int64,1};implicitbitvalue=1.0)
# Check if all the bits in MantissaArray is 0 and all bits in expoArr is 0
if sum(manArr) == 0 && sum(expoArr) == 0
return convert(Float64,0)
end
local expostr = String([ k == 0 ? '0' : '1' for k in reverse(expoArr) ])
local expoVal = parse(Int64,"0b" * expostr)
if sum(manArr) == 0 && expoVal == 1023
return convert(Float64,1)
end
setprecision(BigFloat, 52) do
accumulator = BigFloat(0.0)
multiplier = BigFloat(0.5)
for k in reverse(manArr)
accumulator += k == 1 ? multiplier : BigFloat(0.0)
multiplier /= BigFloat(2.0)
end
accumulator += BigFloat(implicitbitvalue)
return convert(Float64,accumulator)
end
end
function Float64toInt64BitsArray(x::Float64)
return Int64[ k == '1' ? 1 : 0 for k in reverse(bitstring(x)) ]
end
function Int64BitsArraytoFloat64(ba::Array{Int64,1})
local mysign = ba[64]
local mysignificand = significand_alt(ba[1:52])
expostr = String([ k == 0 ? '0' : '1' for k in reverse(ba[53:63]) ])
local myexpo = parse(Int64,"0b" * expostr) - 1023
return (-1.0)^mysign * mysignificand * 2.0^myexpo
end
function Float64toUInt64(x::Float64)
local bs = bitstring(x)
local myUINT64 = parse(UInt64,"0b" * s)
return myUINT64
end
function GetSignBitofInt64BitsArray(ba)
return ba[64]
end
function GetExponentofInt64BitsArray(ba)
return ba[53:63]
end
function GetMantissaofInt64BitsArray(ba)
return ba[1:52]
end
function StringExponent(expoarray)
return String([k == 0 ? '0' : '1' for k in reverse(expoarray)])
end
function StringMantissa(mantissaarray)
return String([k == 0 ? '0' : '1' for k in reverse(mantissaarray)])
end
function GetRawExpoValue(ExpoArray::Array{Int64,1})
local expostr = String([ k == 0 ? '0' : '1' for k in reverse(ExpoArray) ])
return parse(Int64,"0b" * expostr)
end
function GetNormalExpoValue(ExpoArray::Array{Int64,1})
return GetRawExpoValue(ExpoArray) - 1023
end
function SetNormalExpoValue(val::Int64)
return SetRawExpoValue(val + 1023)
end
function SetRawExpoValue(val::Int64)
local ExpoArr = Int64[ 0 for k = 1:11 ]
local count = 1
local x = val
while x > 0
if x % 2 == 1
ExpoArr[count] = 1
end
count += 1
x = div(x,2)
end
return ExpoArr
end
#=
Get Everthing about Float 64 and return a tuple of
1. rawbitsarray
2. signbit
3. expobitsarray
4. mantissabitsarray
5. normalexpovalue
6. mysignificand
=#
function GetEverythingAboutFloat64(num::Float64)
local rawbitsarray = Float64toInt64BitsArray(num)
local signbit = GetSignBitofInt64BitsArray(rawbitsarray)
local expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
local mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
local normalexpovalue = GetNormalExpoValue(expobitsarray)
local mysignificand = significand_alt(expobitsarray,mantissabitsarray)
return (rawbitsarray,signbit,expobitsarray,mantissabitsarray,
normalexpovalue,mysignificand)
end
function square(x::Float64)
# The square of any real number is positive
# so we might as well treat the input as a positive number
local remainder = abs(x)
local result = 0.0
while remainder > 0.0
local mytuple = GetEverythingAboutFloat64(remainder)
local n = mytuple[5]
result += 2.0 ^ (2*n)
# calc new remainder from old remainder
remainder = remainder - (2.0^n)
if remainder > 0.0
# result = result + 2 * 2^n * remainder
local newtuple = GetEverythingAboutFloat64(remainder)
local newsignificand = newtuple[6]
local newexpo = newtuple[5]
result += newsignificand * 2.0^(newexpo+n+1)
end
end
return result
end
function divbyfour(x)
local mytuple = GetEverythingAboutFloat64(x)
local signbit = mytuple[2]
local mysignificand = mytuple[6]
local n = mytuple[5]
return (-1.0)^signbit * mysignificand * 2.0^(n-2)
end
function mymul(A,B)
return divbyfour(square(A+B) - square(A-B))
end
num = 1.5
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"â‚‚")
println("Sign bit is ",signbit,"â‚‚")
println("Expo string is ",StringExponent(expobitsarray),"â‚‚")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"â‚‚")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()
num = Float64(Ď€)
rawbitsarray = Float64toInt64BitsArray(num)
signbit = GetSignBitofInt64BitsArray(rawbitsarray)
expobitsarray = GetExponentofInt64BitsArray(rawbitsarray)
mantissabitsarray = GetMantissaofInt64BitsArray(rawbitsarray)
println("Float64 num is ",num)
println("Rawbits is ",bitstring(num),"â‚‚")
println("Sign bit is ",signbit,"â‚‚")
println("Expo string is ",StringExponent(expobitsarray),"â‚‚")
println("Mantissa string is ",StringMantissa(mantissabitsarray),"â‚‚")
println()
print("converting bitsarray back to Float64 : ",Int64BitsArraytoFloat64(rawbitsarray))
println()
println()
println("Now test mymul")
println("sqrt(2.0) * π = ",sqrt(2.0) * Float64(π))
println("mymul(sqrt(2.0),Float64(Ď€)) = ",mymul(sqrt(2.0),Float64(Ď€)))
with output
Float64 num is 1.5
Rawbits is 0011111111111000000000000000000000000000000000000000000000000000â‚‚
Sign bit is 0â‚‚
Expo string is 01111111111â‚‚
Mantissa string is 1000000000000000000000000000000000000000000000000000â‚‚
converting bitsarray back to Float64 : 1.5
Float64 num is 3.141592653589793
Rawbits is 0100000000001001001000011111101101010100010001000010110100011000â‚‚
Sign bit is 0â‚‚
Expo string is 10000000000â‚‚
Mantissa string is 1001001000011111101101010100010001000010110100011000â‚‚
converting bitsarray back to Float64 : 3.141592653589793
Now test mymul
sqrt(2.0) * π = 4.442882938158366
mymul(sqrt(2.0),Float64(Ď€)) = 4.442882938158366
All right I admit it. I am obsess with the proof of concept of a binary floating point multiplication without multiplying.
I don’t really want to spend hours getting all the edge cases to work. I just want to see it work for the most general cases. I doubt that I want to waste any more time getting INF,NaNs and subnormals working properly.
Once I got the general cases to work, the “fun” element is gone. After all the function is inefficient and non-practical for general floating point multiplication.
The thrill is just getting it to work. Aka proof of concept.
Semi-related but I ran across a paper that approximated away the mantissa multiplication, effectively eliminating a looping or recursive step (Addition is All You Need for Energy-Efficient Language Models, Hongyin Luo & Wei Sun). Short version is (1+m1)*2^e1 * (1+m2)*2^e2
(m
for mantissa<1, e
for exponent) going from
(1+m1+m2+m1*m2)*2^(e1+e2)
to
(1+m1+m2+2^-3 )*2^(e1+e2)
which introduces an error for real numbers, but this error can be acceptable or negligible in applications that use low precision floating point. Float8s.jl doesn’t cover all the 8-bit types they examined, and I won’t implement anything to corroborate their findings, but it seems plausible.
haven’t had time to read their paper, but your introduction just made me clear