Reading binary file to Julia

marianoarnaiz · August 24, 2023, 12:23pm

Hi all.
I have a binary file that I need to load to julia. I have a python scrip that reads it, can anyone help me a bit to turn this to julia, it is not working for me

   # omit the first 4 values (header information) and reshape
    dtype = np.dtype([
        ("x", "<f4"),
        ("y", "<f4"),
        ("z", "<f4"),
        ("pdf", "<f4")])
    data = np.fromfile(filename, dtype=dtype)[4:]
    if coordinate_converter:
        data["x"], data["y"], data["z"] = coordinate_converter(
            data["x"], data["y"], data["z"])
    return data```

jmair · August 24, 2023, 1:02pm

I don’t have some test data to be certain, but perhaps this could work:

struct Data{T}
    x::T
    y::T
    z::T
    pdf::T
end

function read_file(path; coordinate_conversion=nothing)
    data = Data{Float32}[]
    open(path, "r") do io
        
        # buffer of raw bytes for each item of data
        buffer = Vector{UInt8}(undef, sizeof(Data{Float32}))
        while !eof(io)
            readbytes!(io, buffer)
            push!(data, (reinterpret(Data{Float32}, buffer)[1]))
        end
    end
    data = @views data[5:end]
    if !isnothing(coordinate_conversion)
        data .= coordinate_conversion.(data) # convert in-place
    end

    return data
end

I believe reinterpret is little endian for the most part (see this post).

This isn’t the most performant implementation as it keeps resizing the array, but it should get you part of the way there.

marianoarnaiz · August 24, 2023, 1:27pm

I will try tonight.

Here is a test file if you get some time to check:

https://github.com/marianoarnaiz/JULIA/blob/main/mine.20230703.195143.grid0.loc.scat

It should output 4 columns.

joa-quim · August 24, 2023, 1:36pm

Makes sense?

using GMT

julia> gmtconvert("mine.20230703.195143.grid0.loc.scat", binary="4f")
BoundingBox: [2.801335760031742e-41, 3.8999335765838623, 0.10017163306474686, 1.939615754657435e38, 0.0, 1.8999238014221191, 0.0, 107.7950668334961]
19992×4 GMTdataset{Float64, 2}
   Row │       col.1       col.2     col.3     col.4
       │     Float64     Float64   Float64   Float64
───────┼─────────────────────────────────────────────
     1 │ 2.80134e-41  1.93962e38  0.0         0.0
     2 │ 3.74212      0.491428    1.06699   106.243
     3 │ 3.75118      0.487598    1.07511   106.243
     4 │ 3.74868      0.499743    1.07752   106.243
     5 │ 3.74007      0.496765    1.06352   106.243
...

and add header=16 to skip the text headers.

marianoarnaiz · August 24, 2023, 2:18pm

Hi Joaquim!. The last column should be between 0 and 1 if the manual es correct. Quote from the quotes manual:

Scatter file (Binary , FileExtension= *.scat )The Scatter file contains the x,y,z locations and PDF value of each sample of the location PDF. The number of samples to save is specified in the LOCSEARCH statement in the NLLoc Statements section of the Input Control File.Header: (required ) one integer and 3 float valuesnSamples dummy dummy dummy Fields:
nSamples (integer )
number of PDF samples in the following buffer
dummy (float )
unusedBuffer: (required ) Sequence of four float values for each PDF samplex(N), y(N), z(N), pdf(N) (N = 0, nSamples - 1) Fields:
x(N), y(N), z(N) (float )
Non-GLOBAL: x, y and z location of the sample in kilometers relative to the geographic origin.
GLOBAL: longitude and latitude in degrees, and z location in kilometers of the sample.
pdf(N) (float )
PDF value of the sample, normalized so that the volume integral over the corresponding search grid of the PDF = 1.0

joa-quim · August 24, 2023, 3:04pm

Not at computer right now. See the -bi documentation of GMT. It’s pretty flexible in column data type.

joa-quim · August 24, 2023, 9:43pm

Suspect all values shown are good. When reading binaries and use wrong data types all values are garbage, but in this case they are all reasonable.

jmair · August 24, 2023, 10:20pm

Doesn’t this imply the integral sums to 1? Which means that individual PDF values can be larger than one if the volume element is small.

My code produces nonsense output, so probably isn’t right. I have a feeling @joa-quim’s solution is the best one.

GunnarFarneback · August 25, 2023, 5:51am

Does it or does it not match the Python implementation you started off with?

marianoarnaiz · August 25, 2023, 10:49am

I´ve been trying to map @joa-quim values and they look right from this perspective.
I will try something else this afternoon and report back

mkitti · August 25, 2023, 10:53am

Also see StructIO.jl:

Topic		Replies	Views
Single type Binary reading New to Julia binaryio	2	546	January 31, 2020
Trying to read binary file into struct with julia New to Julia question , binaryio	10	5279	October 10, 2018
Importing a binary file to a struct New to Julia question , binaryio	3	936	November 5, 2020
Reading binary data from a file General Usage question , binaryio	9	2972	January 12, 2024
Reading binary data from raw PCM files General Usage binaryio	12	5809	February 2, 2019

Reading binary file to Julia

Related topics