I have just “finished” implementing an inital version of a JSON Lines reader (JSONLines.jl) and would like to ask for some feedback. It implements a specific style of JSON Lines file used in other implementations (e.g. Clickhouse) which has a JSON object in each line of the file. The actual JSON parsing is done using the LazyJSON.jl library. The most interesting feature in my opinion is that an arbitrary number of lines can be skipped and only a subset of lines can be loaded, resulting in memory allocation only for the loaded lines. This is achieved using
mmap and line parsing done with little allocation (skipping all lines results in 34 allocations with 3.28 KiB).
Any feedback is welcome! Please let me know if you have ideas for additional functionality (the obvious next step is a writer function). My next step will be to implement a Table.jl compatible output instead of returning a DataFrame. Eventually I would like to register the package. Any tips in that direction are also welcome. Thanks!
P.s. Please let me know if this is the wrong category for the post.
EDIT: Updates see here
EDIT2: New package for registration is JSONLines.jl