I’m trying to familiarize with the sample code which runs correctly.
There is a line in the epoch iteration train.save(…) which saves a number of binary files labelled from 95 to 99. I understand that they can be recovered with train.restore() but I have not been able to read them.
It’s quite possible these are not meant for diagnostic (human reading) purposes, just for internal use?
Docs for saving and restoring are here.
http://malmaud.github.io/TensorFlow.jl/latest/saving.html
They should be correct and clear,
If not please raise and issue in the repository.
The reason they are numbers is because:
save can be passed a global_step keyword parameter, which is an integer that will be suffixed to the variable file name. The Saver constructor accepts an optional max_to_keep argument, which is an integer specifying how many of the latest versions of the variable files to save (older ones will be discarded to save space) …
By the end of this loop, file “variable_file_95” contains the variable values during the 95th iteration, “variable_file_96” the 96th iteration, etc
They are in the JLD/HDF5 file format.
They are not readable.
You can read them with JLD.jl,
or with hdf5tools.
They are not exactly intended for that though, they should be read by loading them into a graph.
If you want to use things for diagnostic purposes use tensorboard.
See these docs.
Excellent, thanks, I was trying to make sense of the main TensorFlow docs which are harder to read.