Write to the file the structure of the form (key => value) followed by reading using Mmap.mmap


#1

There are two numbers B and A. You must save their file as [B => A]. Then, using Mmap.mmap, read this file and search B, if B is found to get a value A.

Please tell me how to do it. I can not understand how nested arrays in Julia. And as I understand it Mmap.mmap works only with an array.?!


#2

Is this a homework problem?


#3

What do you mean by “homework”?
This task is my personal …


#4

The wording sounds like a homework problem (“you must”).

Can you post the solution you have tried?


#5

Understood, this is a translator’s cost, since I speak English poorly …
Especially I have not tried. this is not what i need

	setFile = function()
		a = 5
		b = 6
		a1 =7
		b2 =9
		A = Array{Int64,2}
		A = [a b;a1 b1]
		s = open("mmap1.bin", "w+")
		write(s, size(A,1))
		write(s, size(A,2))

		write(s, A)
		close(s)
		
		
	end
	readFile = function()
	# Test by reading it back in
		s = open("mmap1.bin")   # default is read-only
		m = read(s, Int)
		n = read(s, Int)
		A2 = Mmap.mmap(s, Matrix{Int}, (m,n))
		println(A2)
		finalize(A2)
		
	end
	setFile()
	readFile()

#6

Need something like this

	ArBP = Dict();
		B = 8
		A = 12
		ArBA = Dict();
		push!(ArBA,(B=>A))
		s = open("mmap1.bin", "w+")
			write(s, ArBA)
		close(s)
		
		s = open("mmap1.bin"
		A2 = Mmap.mmap(s, Dict(),...)
		if(A2[8])
			return A2[8]
		end	

But, mmap does not work with dict.


#7

Use collect and Dict to convert between Dict and a Vector of Pairs.

julia> d = Dict(1 => 3, 2 => 4)
Dict{Int64,Int64} with 2 entries:
  2 => 4
  1 => 3

julia> collect(d)
2-element Array{Pair{Int64,Int64},1}:
 2 => 4
 1 => 3

julia> Dict(collect(d))
Dict{Int64,Int64} with 2 entries:
  2 => 4
  1 => 3

#8

Thank you!
It really works!

	setFile = function()
		d = Dict(1324 => 3234, 2879 => 4564)
		A = collect(d)
		s = open("mmap4.bin", "w+")
		write(s, size(A,1))
		write(s, size(A,2))
		write(s, A)
		close(s)
		
	end
	readFile = function()
		s = open("mmap4.bin")
		m = read(s, Int)
		A2 = Mmap.mmap(s, Array{Pair{Int64,Int64},1}, (m,))
		arBp = Dict(collect(A2))
		println(arBp[1324])
		finalize(A2)	
	end
	setFile()
	readFile()

Also tell me, with a very large file, about a billion records. Will I have performance problems?


#9

It may not be fast, but if you don’t have enough memory then I guess this is the best solution.


#10

Wait, no, this is not how memory mapping works. By reading it into a dictionary that way you’ll defeat the whole purpose of memory mapping, and no it won’t work with billions of objects (unless you have an absurd amount of RAM). You probably want to just work with arrays (not dictionaries), and loop over the entire memory mapped array until you find the element you’re looking for. There is a description of memory mapping and some code examples which are very similar to what you want to do available here.

Note of course that if you’ll be doing this operation more than once, you’d probably want to index or sort the list in some way to avoid the linear time lookup.


#11

Indeed, I need to work with arrays, but as I wrote in the first post, I do not understand how to convert the structure I need into an array in Julia. For example, in php I would do this:

$ar[0][b][0][a]

And would have sorted by b

If I understand you correctly, this code will try to load all the data into RAM?

Dict(collect(A2))

#12

Yes, it would.

In my response, I was assuming you serialize for storage only.

If you want data larger than memory, and also fast lookup, you need to do something more complicated. Eg you could sort on the key column, then use the searchsorted* functions.


#13

A2 in your example is an array already no? So if you need to search for a value, you could loop over the array, row by row, until you find the value you’re looking for.

But what is your use case exactly? For a billion records, this will be a very slow operation. If you’ll be doing repeated reads, you probably want to preprocess your data (e.g. sort the numbers), or perhaps use an existing database.