Hello, I have this two kind of variabiles, x and y, to save to file, and after to reload.
julia> x
Dict{Any,Any} with 2 entries:
"day" => [1, 2, 3]
"beautifull" => [7, 5, 10, 11, 23, 56]
julia> y = [["WORD1","WORD2","WORD3"],[43,54,16]]
2-element Array{Array{T,1} where T,1}:
["WORD1", "WORD2", "WORD3"]
[43, 54, 16]
What is the best way to store it? i have done it by using LDL2.jl package, but perhaps there are another ways by using another package, or another way yet in Julia base?
It’s better, for performance, if i declare the type of this vars, of i doesn’t mind - in terms of speed - in following elaborations of these vars?
Mmm, i have made some tests. I was loading and saving 3 variables (1.2MByte of dict, 1.7MByte of array and a little 1.5KByte of text). Time performance in loading the files stored, what i am interested in, are quite the same.
# with DLD2 , 3 vars into a single file
06/03/2020 15:49 17.481.362 julia-data-mv.ev.01.txt.jld2
#with serialize, 3 single files
06/03/2020 15:46 1.555 julia-data-mv.ev.01.txt.parts.dat
06/03/2020 15:46 1.205.052 julia-data-mv.ev.01.txt.wap.dat
06/03/2020 15:46 1.688.390 julia-data-mv.ev.01.txt.was.dat
The great difference is realy in the size of stored files.
With serialization, almost 3MByte.
With LDL2 package 17MByte (!!!) … but i have read, LDL2 has more features … and the fact to store all variable in a single file, it is not so bad to be honest, while in serialization in load and save one variabile at the time, it seems to me.
Ok, so, if for now i am interested in fast loading of data stored in some variables, i can use serialization without need of LDL2 package.
Oh, i am doing a web app for text searching, in Julia and Genie. I wanted the max fast text elaboration, so i avoided pascal, php and python, and i choose Julia.
Not in the sense of google searching, but in sense of contextual search: i have, let’s say, 10 books - 10 utf8 text files - and i have to find all point in these texts, where some words are close to each other in a same context. To do this, i have to create ad “index” of a text, in some way. After, i look into this index. The first time I want to find something in some files, i create my index, and do the searching. The following searching, on the same files, will be more fast because i previosly saved the indexes, and so, no need to rebuild them each time i search.
Ok, so the problem is, create the index of a text file only one time (so i don’t mind if it take time), and load the index of it all the time you will serch into that file, instead of loading the original file and compute the index. The save is 1 time, but is the loading, that is frequent, and I wish it was as quick as possible.
My experience with JLD2.jl and BSON.jl: sometimes it works and sometimes it doesn’t. I.e., I get various errors when trying to load the files.
I see no patterns in when the errors are thrown, other than that I am rarely able to load large files (large being only about 30MB).
This even holds when the files only store “built-in” Julia objects (Dict{Symbol, Matrix{Float64}} for example). I have given up trying to load user defined types.
Perhaps I am missing some alternatives that work, but from my experience and from the discussions on the various packages that load and save binary data here on discourse, I have come to the conclusion that I should avoid binary formats, unless I am prepared to lose my data.
I really, really hope that this conclusion is wrong. If not, this strikes me as a major problem for the Julia language.