Reading binary file containing mixed data types

,

Hi, I am trying to read a binary file generated from eigenpair computation using the SLEPc library in C. I am using the following command:

PetscViewerBinaryOpen(PETSC_COMM_WORLD, fvalname, FILE_MODE_WRITE, &viewer);
EPSValuesView(eps, viewer);

which generates a file which I try to read in julia like this:

y = Array{Float64}(undef, filesize("eigval_L=4_h=2.700_J=1.0_itr_1.dat")÷8)
read!("eigval_L=4_h=2.700_J=1.0_itr_1.dat", y)
12-element Vector{Float64}:
 -2.5148627646627907e138
  0.0
  8.082852581098644e181
  0.0
 -2.797789591600618e77
  0.0
 -6.743903390763595e41
  0.0
 -1.5503462044558484e-227
  0.0
 -9.890240184768374e-89
  0.0

As you can see it generates garbage values. The reason probably is because the data contains some string values too. For instance, if I write my data in ASCII format instead, it looks like this:

Eigenvalues =
   1.77528
   2.31031
   1.29962
   2.66192
   0.08095
   4.08478

Furthermore, the eigenvector file looks even more complicated:

Vec Object: Xr0_EPS_0x84000004_0 3 MPI processes
 type: mpi
Process [0]
-0.909229
0.415619
Process [1]
-0.00206267
0.00410161
Process [2]
0.0145993
0.01813
Vec Object: Xi0_EPS_0x84000004_0 3 MPI processes
 type: mpi
Process [0]
0.
0.
Process [1]
0.
0.
Process [2]
0.
0.

So how do I go on reading this type of file in Julia?

I suspect you have an endianness problem. Do you get better values from ntoh.(y)?

1 Like

Thank you!

This works! But could you please explain what was wrong?

Also, I see that the eigenvectors are padded with some values very close to zero.

2.5701910099582143e-308
 -0.08658646381578468
  0.21664079294930008
  0.2768194902948219
  0.18273895339582577
 -0.34904098005790685
 -0.8448179626754538
  2.5701910099582143e-308
 -0.1958704084404449
 -0.7848867729177095
  0.48579694676651525
  0.15122989738969175
  0.2757865081468033
 -0.1032481639944464
  2.5701910099582143e-308

How to get rid of these?

Moreover, I realized that using hdf5 formatting is much nicer to work with than the binary one. However, the size of an hdf5 file is almost twice that of the binary file for the same data. Is there any work around for this?

When you have multi-byte data types it’s not obvious how you should store them in memory, least significant bytes first or most significant bytes first, and historically different choices have been made. This carries over to file storage; if you only have raw bytes you don’t know which convention is used and if you interpret them wrong you will see them in the reverse order and it will look like garbage. See Endianness - Wikipedia for much more information.

All file formats have a certain amount of metadata and structural overhead compared to a raw file. How much depends on the format but usually the overhead is relatively smaller when the amount of data grows. Some formats include compression, which can make the files smaller, but how much depends on what data is stored.

Thanks for explaining, I will check out the reference. Also, I have updated my last comment, could you please see it once more?

No idea, but in general, unless the data is very simple, you need some kind of specification to be able to make sense of raw bytes.