Reading binary files in pieces


#1

Hello all. I suspect the following problem is very simple from a programming perspective, but unfortunately I have so little experience with serialization/deserialization code that I’m a bit baffled about where to start.

A very common problem that I face is that I have some computationally expensive function call f(x) which returns a modest sized (not more than a few 100 MB, although sometimes something as simple as a single float!) object which I store in some variable y, and I want to run the program containing the call f(x) again after having made some changes to other parts of the program.

An imperfect solution which would nevertheless be effective in most cases would be to write some macro so that I can instead call

@autoserialize y = f(x) "somelocalfile.bin"

which would store y in some local file along with some metadata (like, maybe the expression :(f(x)) possibly along with the value of x) the first time I run the program, but on subsequent runs load y from disk. It would be really nice if I can store this in some format that I can easily convert directly back into a Julia object (regardless of its type), such as one can do with the built-in functions serialize and deserialize.

My problem is that I would really like to store the aforementioned metadata right along with the object I’m serializing in my file in such a way that I can go in and look at the metadata to determine what I’d need to re-load before I load it. This I really don’t know how to do. What I do know how to do is create multiple files, a small one with metadata that can be loaded quickly, and one or several big ones with the values I want to store. Then I can write my macro to load the metadata files, determine if the objects stored there need to be reloaded, and then do that where necessary. This isn’t a terrible solution, but I would really like it if instead of cluttering things up with a whole directory structure for every program that calls my macro, I could just keep this in a single binary file from which I can quickly read metadata before deserializing the whole file.

Perhaps my question is too broad, but does anybody have any ideas on how best to get started on this? Or perhaps I am just being silly and should cave in and use multiple separate files? Can I somehow combine Mmap.mmap with serialize and deserialize to get what I want?


#2

The ReusableFunctions package may be useful to you. ReusableFunctions provides similar functionality to the Memoize package, except it stores the results on disk (and doesn’t use macros). ReusableFunctions isn’t very well documented, but here’s an example of how it might work for you:

import ReusableFunctions
import JLD

function f(x)
  println("I'm running f(x)!")
  return x + 1
end
rf = ReusableFunctions.maker3function(f, "f_results")#the "f_results" part means that the files will be stored in a directory called "f_results"
rf(1)#calls f(1) and prints "I'm running f(x)!"
rf(1)#doesn't call f(1), loads result from a file...
#...or you could load things manually:
x, f_of_x = JLD.load(ReusableFunctions.gethashfilename("f_results", 1), "x", "result")