Reading binary files in pieces

ExpandingMan · February 28, 2017, 9:47pm

Hello all. I suspect the following problem is very simple from a programming perspective, but unfortunately I have so little experience with serialization/deserialization code that I’m a bit baffled about where to start.

A very common problem that I face is that I have some computationally expensive function call f(x) which returns a modest sized (not more than a few 100 MB, although sometimes something as simple as a single float!) object which I store in some variable y, and I want to run the program containing the call f(x) again after having made some changes to other parts of the program.

An imperfect solution which would nevertheless be effective in most cases would be to write some macro so that I can instead call

@autoserialize y = f(x) "somelocalfile.bin"

which would store y in some local file along with some metadata (like, maybe the expression :(f(x)) possibly along with the value of x) the first time I run the program, but on subsequent runs load y from disk. It would be really nice if I can store this in some format that I can easily convert directly back into a Julia object (regardless of its type), such as one can do with the built-in functions serialize and deserialize.

My problem is that I would really like to store the aforementioned metadata right along with the object I’m serializing in my file in such a way that I can go in and look at the metadata to determine what I’d need to re-load before I load it. This I really don’t know how to do. What I do know how to do is create multiple files, a small one with metadata that can be loaded quickly, and one or several big ones with the values I want to store. Then I can write my macro to load the metadata files, determine if the objects stored there need to be reloaded, and then do that where necessary. This isn’t a terrible solution, but I would really like it if instead of cluttering things up with a whole directory structure for every program that calls my macro, I could just keep this in a single binary file from which I can quickly read metadata before deserializing the whole file.

Perhaps my question is too broad, but does anybody have any ideas on how best to get started on this? Or perhaps I am just being silly and should cave in and use multiple separate files? Can I somehow combine Mmap.mmap with serialize and deserialize to get what I want?

omalled · February 28, 2017, 10:29pm

The ReusableFunctions package may be useful to you. ReusableFunctions provides similar functionality to the Memoize package, except it stores the results on disk (and doesn’t use macros). ReusableFunctions isn’t very well documented, but here’s an example of how it might work for you:

import ReusableFunctions
import JLD

function f(x)
  println("I'm running f(x)!")
  return x + 1
end
rf = ReusableFunctions.maker3function(f, "f_results")#the "f_results" part means that the files will be stored in a directory called "f_results"
rf(1)#calls f(1) and prints "I'm running f(x)!"
rf(1)#doesn't call f(1), loads result from a file...
#...or you could load things manually:
x, f_of_x = JLD.load(ReusableFunctions.gethashfilename("f_results", 1), "x", "result")

Topic		Replies	Views
File output, binary New to Julia binaryio , io	4	847	February 27, 2024
Object Serialization In Julia General Usage	7	3927	January 22, 2018
Future-proof way to save and load data General Usage	12	1097	May 9, 2020
How to deserialize inside a module General Usage	3	876	April 5, 2020
Preferred method of loading binary files? New to Julia binaryio	3	1033	July 11, 2019

Reading binary files in pieces

Related topics