Reading binary file to Julia

Hi all.
I have a binary file that I need to load to julia. I have a python scrip that reads it, can anyone help me a bit to turn this to julia, it is not working for me

   # omit the first 4 values (header information) and reshape
    dtype = np.dtype([
        ("x", "<f4"),
        ("y", "<f4"),
        ("z", "<f4"),
        ("pdf", "<f4")])
    data = np.fromfile(filename, dtype=dtype)[4:]
    if coordinate_converter:
        data["x"], data["y"], data["z"] = coordinate_converter(
            data["x"], data["y"], data["z"])
    return data```

I don’t have some test data to be certain, but perhaps this could work:

struct Data{T}

function read_file(path; coordinate_conversion=nothing)
    data = Data{Float32}[]
    open(path, "r") do io
        # buffer of raw bytes for each item of data
        buffer = Vector{UInt8}(undef, sizeof(Data{Float32}))
        while !eof(io)
            readbytes!(io, buffer)
            push!(data, (reinterpret(Data{Float32}, buffer)[1]))
    data = @views data[5:end]
    if !isnothing(coordinate_conversion)
        data .= coordinate_conversion.(data) # convert in-place

    return data

I believe reinterpret is little endian for the most part (see this post).

This isn’t the most performant implementation as it keeps resizing the array, but it should get you part of the way there.

1 Like

I will try tonight.

Here is a test file if you get some time to check:

It should output 4 columns.

Makes sense?

using GMT

julia> gmtconvert("mine.20230703.195143.grid0.loc.scat", binary="4f")
BoundingBox: [2.801335760031742e-41, 3.8999335765838623, 0.10017163306474686, 1.939615754657435e38, 0.0, 1.8999238014221191, 0.0, 107.7950668334961]
19992Γ—4 GMTdataset{Float64, 2}
   Row β”‚       col.1       col.2     col.3     col.4
       β”‚     Float64     Float64   Float64   Float64
     1 β”‚ 2.80134e-41  1.93962e38  0.0         0.0
     2 β”‚ 3.74212      0.491428    1.06699   106.243
     3 β”‚ 3.75118      0.487598    1.07511   106.243
     4 β”‚ 3.74868      0.499743    1.07752   106.243
     5 β”‚ 3.74007      0.496765    1.06352   106.243

and add header=16 to skip the text headers.


Hi Joaquim!. The last column should be between 0 and 1 if the manual es correct. Quote from the quotes manual:

  • Scatter file (Binary , FileExtension= *.scat )The Scatter file contains the x,y,z locations and PDF value of each sample of the location PDF. The number of samples to save is specified in the LOCSEARCH statement in the NLLoc Statements section of the Input Control File.Header: (required ) one integer and 3 float valuesnSamples dummy dummy dummy Fields:
    nSamples (integer )
    number of PDF samples in the following buffer
    dummy (float )
    unusedBuffer: (required ) Sequence of four float values for each PDF samplex(N), y(N), z(N), pdf(N) (N = 0, nSamples - 1) Fields:
    x(N), y(N), z(N) (float )
    Non-GLOBAL: x, y and z location of the sample in kilometers relative to the geographic origin.
    GLOBAL: longitude and latitude in degrees, and z location in kilometers of the sample.
    pdf(N) (float )
    PDF value of the sample, normalized so that the volume integral over the corresponding search grid of the PDF = 1.0

Not at computer right now. See the -bi documentation of GMT. It’s pretty flexible in column data type.

1 Like

Suspect all values shown are good. When reading binaries and use wrong data types all values are garbage, but in this case they are all reasonable.

Doesn’t this imply the integral sums to 1? Which means that individual PDF values can be larger than one if the volume element is small.

My code produces nonsense output, so probably isn’t right. I have a feeling @joa-quim’s solution is the best one.

Does it or does it not match the Python implementation you started off with?

IΒ΄ve been trying to map @joa-quim values and they look right from this perspective.
I will try something else this afternoon and report back

Also see StructIO.jl: