Disk backed Dict?

question

#1

Is there a disk backed Dict somewhere? I would like to be able to do the following:

d = DiskBackedDict{K,V}(mypath)::Associative{K,V}  # load or create new one if not existing
x = d[foo] # this is as fast as an ordinary Dict lookup.
d[bar] = y # this may be dog slow, but the result must be stored at mypath

d2 = DiskBackedDict{K,V}(mypath)  # error mypath already in use

#2

Wouldn’t d[foo] be slower since it’s disk-based?


#3

I would like DiskBackedDict to cache stuff in memory for fast access.


#4

Wondering how big is the dictionary of data you are thinking about using? If it’s smaller one can roll a simple solution fairly quickly in a couple of hours.
Not sure if one already exists


#5

Not big, maybe 100mb at most. I guess its indeed not too hard to implement this. However if there is an existing solution I would prefer that. Also there are some implementation issues I am unsure about:

  1. For storage of arbitrary julia objects, JLD is the only option I can think of. But I don’t know if there is a way to modify a value in a JLD file. The only way I am aware of is delete the file and save the modified version.
  2. I am not sure how to do the locking of a path that belongs to an active DiskBackedDict.

How to lock a file?
#6
  1. Is why I think it’s only good for a small dict. Maybe there are solutions that let you append to a file but then you have to design the format.
  2. I think you can acquire an exclusive lock on the file when you open it. It should be in an openfile function somewhere maybe by another name.

#7

I recently needed something like this and thought about using an mmaped array for storage of the values. The problem with that is that

  1. you cannot grow mmapped arrays, so they have to be reallocated occasionally,
  2. only bits types would work.

But otherwise it should be simple to adapt the code from DataStructures.jl or Base.


#8

just to correct your statement regarding JLD value modification.
It is possible to delete single dataset in JLD file and replace it with new one:

using JLD

# create JLD with two datasets
file = jldopen("mydata.jld", "w")
file["a"] = [1:100]
file["b"] = [1:10]
close(file)

#replace content of dataset "a"
file = jldopen("mydata.jld", "r+")
delete!(file["a"])
file["a"] = "changed"
close(file)

@load(mydata.jld)
show(a)
show(b)

#9

I put something together here, based on @slowbrain’s remark.


#10

What you are essentially doing is making something like key-value based store - and it’s not that difficult to make a Julia for them. I made one back in 2015 for Aerospike, that we use for our product at http://www.dynactionize.com.


#11

I have tried several solutions to this problem, including JDL.

For me, in the end TOML was the best solution:

It is an INI like file but it fits much better to Julia since the type of the values can be specified. If you don’t have to complicated types this could be a very goos solution. One big plus for me was that I can hand edit the files.