Help defining a custom IBMFloat32 primitive type

juliohm · November 15, 2025, 7:17pm

I need to consume binary data from an old file format that uses 4-byte IBM floating point as opposed to IEEE floating point (i.e., Float32). The 4-byte IBM floating point type is also known as IBM HFP (hexadecimal floating point):

By reading the Julia documentation, I tried to define the following primitive type with a conversion method to Float32:

"""
    IBMFloat32 <: AbstractFloat

32-bit (4-byte) IBM floating point number type (a.k.a. IBM HFP).
See https://en.wikipedia.org/wiki/IBM_hexadecimal_floating-point.

The 4 bytes stored in the type are organized as follows:

[S | EEEEEEE | FFFFFFFF | FFFFFFFF | FFFFFFFF]

S is the sign bit, EEEEEEE is the 7-bit exponent (base 16, excess-64),
and FFFFFFFF (x3) are the 24 bits (3 bytes) of the fraction.

If S = 1, the number is negative; if S = 0, the number is positive.

The value represented is given by

    (-1)^S × (FFFFFFFF|FFFFFFFF|FFFFFFFF) × 16^(EEEEEEE - 64)

If the fraction is zero, the value is zero regardless of the exponent or sign.
"""
primitive type IBMFloat32 <: AbstractFloat 32 end

function convert(::Type{Float32}, x::IBMFloat32)
  # extract 4 bytes from IBMFloat32
  b1, b2, b3, b4 = reinterpret(NTuple{4,UInt8}, x)

  # compute fraction from last 3 bytes
  F = reinterpret(UInt32, (zero(UInt8), b2, b3, b4))

  # if fraction is zero, value is zero
  iszero(F) && return zero(Float32)

  # compute sign from first bit
  S = b1 >> 7 # shift first bit all the way to the right

  # compute exponent from remaining 7 bits
  E = b1 & 0b01111111 # knock out first bit in byte

  Float32((-1)^S * F * 16^(E - 64))
end

I am not sure if the implementation is correct yet. In order to test the implementation I need to read bytes from an IO stream as IBMFloat32. I tried the following:

io = IOBuffer("abcdefghijklmnopqrstuvwxyz")

ibmfloat = read(io, IBMFloat32)

but the method failed with

ERROR: The IO stream does not support reading objects of type IBMFloat32.

What do I need to implement to make this work?

jakobnissen · November 15, 2025, 7:50pm

You just need to implement the read method:

Base.read(io::IO, ::Type{IBMFloat32}) = reinterpret(IBMFloat32, read(io, UInt32))

Benny · November 15, 2025, 8:27pm

I don’t think this conversion is feasible, the base in IBM hexadecimal floating point is 16 instead of 2 (the intermediate powers of 2 are presumably represented by the significand with more leading zeros, effectively less precision), so it has a much wider range (10^-79 to 10^75) compared to Float32 (10^-45 to 10^38). Maybe Float64? The link does say

in order to have similar exponent range in binary, 9 exponent bits would be required.

juliohm · November 15, 2025, 8:40pm

Should I define a method to convert to Float64 instead? Do you have other suggestions to improve the implementation @Benny ?

matthias314 · November 15, 2025, 11:06pm

You cannot raise an integer to a negative power. If you replace 16 by 16.0, it still gives wrong results.

Here is a (not very polished) implementation that works directly with the bit representations. I’m using UInt32 as input.

function f(x::UInt32)
    f0 = x & 0x00ffffff
    iszero(f0) && return Float64(0)
    e0 = x & 0x7f000000
    s0 = x & 0x80000000
    k = leading_zeros(f0)-8
    f1 = (Int64(f0) << (28 + k + 1)) & 0x000fffffffffffff
    e1 = (Int64(e0) >> 22 - 4*64 - k - 1 + 1023) << 52
    s1 = Int64(s0) << 32
    reinterpret(Float64, s1 | e1 | f1)
end

Examples from the Wikipedia page:

julia> x = 0b1_1000010_011101101010000000000000; f(x)
-118.625

julia> x = 0b0_1111111_111111111111111111111111; f(x)
7.2370051459731155e75

julia> x = 0b0_0000000_000100000000000000000000; f(x)
5.397605346934028e-79

juliohm · November 15, 2025, 11:16pm

Thank you @matthias314. Could you please comment on the implementation? How do you come up with those hexadecimal literals?

matthias314 · November 15, 2025, 11:31pm

s0, e0, f0 are the bit fields of the sign, the exponent and the fraction in the input. s1, e1 and f1 are those for the output. k is the number of leading zeros in the 24-bit number f0. You need it to adjust the exponent and the fraction. For example, if k == 2, then the (binary) fraction starts with 0.001, but you want it to start with 1., which is a shift by k+1 positions. Int64(e0) >> 22 is 4 times the exponent. (We have to change from base 16 to base 2.) You have to subtract 4*64 and k+1 and add 1023 to get the right bias. Int64(f0) << (28 + k + 1) moves the fraction to the correct position in Float64 (changing from width 24 to 52). However, now the most significant bit set in f0 is at position 2^52. Since it is not stored in Float64, you have to clear it, which I do with & 0x000fffffffffffff.

Does this help?

juliohm · November 15, 2025, 11:42pm

It certainly helps! I will study it carefully. Thank you.

Benny · November 16, 2025, 12:24am

Unless there’s a smaller binary format with 9 exponent bits and 24 explicit significand bits, then probably. Bonus performance from Float64 operations being implemented in hardware.

I’d also agree with starting from matthias314’s f(x::UInt32). There’s no need to make a new type IBMFloat32 just to read bits to begin with; starting at UInt32 is enough, though you wouldn’t use the function or method convert(::T, x::UInt32) to reinterpret it. It also makes more sense to manipulate bits and reinterpret directly to the output type, rather than implementing the formula (-1)^S * F * 16^(E - 64) with numeric types that don’t need to be involved. I would also point out that the link’s formula is (−1)^{sign} × 0.<significand> × 16^{exponent−64}, meaning the significand’s bits represent a binary fraction <1, not F::UInt32.

Implementing the IBMFloat32 type, its read, and its convert would be more worth it if you also implement its particular operations, and so far it doesn’t appear you intend to emulate IBM hexadecimal computation and its notably wobbling precision.

matthias314 · November 16, 2025, 12:31am

I agree with @Benny. Apart from the missing factor 2^{-24}, I think the bit order is wrong in the OP code. Here is a modification that should work:

function juliohm(x::UInt32)
  # extract 4 bytes from IBMFloat32
  b1, b2, b3, b4 = reinterpret(NTuple{4,UInt8}, x)

  # compute fraction from first 3 bytes
  F = reinterpret(UInt32, (b1, b2, b3, zero(UInt8)))

  # if fraction is zero, value is zero
  iszero(F) && return zero(Float32)

  # compute sign from highest bit
  S = b4 >> 7 # shift highest bit all the way to the right

  # compute exponent from remaining 7 bits
  E = b4 & 0b01111111 # knock out first bit in byte

  Float64((-1)^S * F * 2.0^(-24) * 16.0^(E - 64))
end

Topic		Replies	Views
Parsing Float32 literals General Usage	3	548	June 12, 2019
How do I convert a binary string to a floating point number? General Usage question , bitstring	3	5572	March 2, 2019
Why isn't Float32 == Float64 (Converting from Float32 to Float64) General Usage question	5	689	July 2, 2020
BigFloat number conversion problem New to Julia	15	1301	January 21, 2019
Converting Float32 to N0f32 in FixedPointNumbers.jl Numerics question	2	1019	January 24, 2018

Help defining a custom IBMFloat32 primitive type

Related topics