I am taking my first little steps in JuliaDB. I am planning to follow https://juliadb.org/latest, including the tutorial. I do not only want to access the data, but reduce the loading time relative to .csv . (I have long wondered what a good store for financial securities data [with its sparse matrix organization] should be. I was just about to try out sqlite. But maybe JuliaDB is ideal for this.)
First, I cut a small piece of the (pricey) CRSP data:
"permno","yyyymmdd","prc","vol","ret","shrout","openprc","numtrd","retx","vwretd","ewretd","eom"
10000,19860108,-2.5,12800,-0.02439,3680,NA,NA,-0.02439,-0.020744,-0.005117,0
10000,19860109,-2.5,1400,0,3680,NA,NA,0,-0.011219,-0.011588,0
10000,19860110,-2.5,8500,0,3680,NA,NA,0,0.000083,0.003651,0
10000,19860113,-2.625,5450,0.05,3680,NA,NA,0.05,0.002749,0.002433,0
10000,19860114,-2.75,2075,0.047619,3680,NA,NA,0.047619,0.000366,0.004474,0
10000,19860115,-2.875,22490,0.045455,3680,NA,NA,0.045455,0.008206,0.007693,0
10000,19860116,-3,10900,0.043478,3680,NA,NA,0.043478,0.004702,0.00567,0
10000,19860117,-3,8470,0,3680,NA,NA,0,-0.001741,0.003297,0
10000,19860120,-3,1000,0,3680,NA,NA,0,-0.003735,-0.001355,0
Julia 1.0.2, JuliaDB 0.9.0. First, letβs load a data sample and save it to disk in order to experiment with how much storage it will take and how fast it will be:
julia> using JuliaDB
julia> @time sample=loadtable("./sample.csv")
11.862888 seconds (42.01 M allocations: 2.037 GiB, 7.64% gc time)
Table with 9 rows, 12 columns:
permno yyyymmdd prc vol ret shrout openprc numtrd retx vwretd ewretd eom
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
10000 19860108 -2.5 12800 -0.02439 3680 missing missing -0.02439 -0.020744 -0.005117 0
10000 19860109 -2.5 1400 0.0 3680 missing missing 0.0 -0.011219 -0.011588 0
10000 19860110 -2.5 8500 0.0 3680 missing missing 0.0 8.3e-5 0.003651 0
10000 19860113 -2.625 5450 0.05 3680 missing missing 0.05 0.002749 0.002433 0
10000 19860114 -2.75 2075 0.047619 3680 missing missing 0.047619 0.000366 0.004474 0
10000 19860115 -2.875 22490 0.045455 3680 missing missing 0.045455 0.008206 0.007693 0
10000 19860116 -3.0 10900 0.043478 3680 missing missing 0.043478 0.004702 0.00567 0
10000 19860117 -3.0 8470 0.0 3680 missing missing 0.0 -0.001741 0.003297 0
10000 19860120 -3.0 1000 0.0 3680 missing missing 0.0 -0.003735 -0.001355 0
julia> save( sample, "mysample.jdb" )
ERROR: DivideError: integer division error
Stacktrace:
[1] rem at ./int.jl:233 [inlined]
[2] padalign(::IOStream, ::Int64) at /Users/ivo/.julia/packages/MemPool/stadz/src/io.jl:13
[3] mmwrite(::Serialization.Serializer{IOStream}, ::Array{Missing,1}) at /Users/ivo/.julia/packages/MemPool/stadz/src/io.jl:38
...
-
The load time for a 10-line csv file seems slow. I hope it is a fixed cost and not a variable cost.
-
Is
.jdb
the recommended file extension? -
My first attempt was just to see how much storage overhead a juliadb db takes. Is
save( object, filename)
the correct function? What did I do wrong? The error could be a bit betterβ¦
regards,
/iaw