Reading binary file containing mixed data types

devanshu · March 31, 2023, 9:50am

Hi, I am trying to read a binary file generated from eigenpair computation using the SLEPc library in C. I am using the following command:

PetscViewerBinaryOpen(PETSC_COMM_WORLD, fvalname, FILE_MODE_WRITE, &viewer);
EPSValuesView(eps, viewer);

which generates a file which I try to read in julia like this:

y = Array{Float64}(undef, filesize("eigval_L=4_h=2.700_J=1.0_itr_1.dat")÷8)
read!("eigval_L=4_h=2.700_J=1.0_itr_1.dat", y)
12-element Vector{Float64}:
 -2.5148627646627907e138
  0.0
  8.082852581098644e181
  0.0
 -2.797789591600618e77
  0.0
 -6.743903390763595e41
  0.0
 -1.5503462044558484e-227
  0.0
 -9.890240184768374e-89
  0.0

As you can see it generates garbage values. The reason probably is because the data contains some string values too. For instance, if I write my data in ASCII format instead, it looks like this:

Eigenvalues =
   1.77528
   2.31031
   1.29962
   2.66192
   0.08095
   4.08478

Furthermore, the eigenvector file looks even more complicated:

Vec Object: Xr0_EPS_0x84000004_0 3 MPI processes
 type: mpi
Process [0]
-0.909229
0.415619
Process [1]
-0.00206267
0.00410161
Process [2]
0.0145993
0.01813
Vec Object: Xi0_EPS_0x84000004_0 3 MPI processes
 type: mpi
Process [0]
0.
0.
Process [1]
0.
0.
Process [2]
0.
0.

So how do I go on reading this type of file in Julia?

GunnarFarneback · March 31, 2023, 11:26am

I suspect you have an endianness problem. Do you get better values from ntoh.(y)?

devanshu · March 31, 2023, 1:29pm

Thank you!

This works! But could you please explain what was wrong?

Also, I see that the eigenvectors are padded with some values very close to zero.

2.5701910099582143e-308
 -0.08658646381578468
  0.21664079294930008
  0.2768194902948219
  0.18273895339582577
 -0.34904098005790685
 -0.8448179626754538
  2.5701910099582143e-308
 -0.1958704084404449
 -0.7848867729177095
  0.48579694676651525
  0.15122989738969175
  0.2757865081468033
 -0.1032481639944464
  2.5701910099582143e-308

How to get rid of these?

Moreover, I realized that using hdf5 formatting is much nicer to work with than the binary one. However, the size of an hdf5 file is almost twice that of the binary file for the same data. Is there any work around for this?

GunnarFarneback · March 31, 2023, 1:49pm

When you have multi-byte data types it’s not obvious how you should store them in memory, least significant bytes first or most significant bytes first, and historically different choices have been made. This carries over to file storage; if you only have raw bytes you don’t know which convention is used and if you interpret them wrong you will see them in the reverse order and it will look like garbage. See Endianness - Wikipedia for much more information.

All file formats have a certain amount of metadata and structural overhead compared to a raw file. How much depends on the format but usually the overhead is relatively smaller when the amount of data grows. Some formats include compression, which can make the files smaller, but how much depends on what data is stored.

devanshu · March 31, 2023, 2:13pm

Thanks for explaining, I will check out the reference. Also, I have updated my last comment, could you please see it once more?

GunnarFarneback · March 31, 2023, 2:17pm

No idea, but in general, unless the data is very simple, you need some kind of specification to be able to make sense of raw bytes.

Topic		Replies	Views
Binary_reading New to Julia binaryio	8	1703	June 27, 2019
Single type Binary reading New to Julia binaryio	2	544	January 31, 2020
Reading binary data from raw PCM files General Usage binaryio	12	5774	February 2, 2019
Reading binary file to Julia New to Julia question	10	600	August 25, 2023
How do I have to read an unknown number of floats in binary data till EOF? New to Julia question , binaryio	8	2029	October 3, 2021

Reading binary file containing mixed data types

Related topics