Package interface best practice: More functions or more arguments?

danielw2904 · August 24, 2020, 1:15pm

For my package JSONLines.jl I am considering a refactoring and provide 3 options to read a JSONLines file:

Iterator over an mmaped file. Basically returns the mmaped file at first and implements an iterator that produces the next row on each interation (can be parsed or returned as Vector{UInt8})
Index of an mmaped file. Mmaps the file and iterates over it once saving the indices for the newlines such that rows can be accesed via getindex (can be parsed or returned as Vector{UInt8}).
Read and parse the whole file.

Would it be prefereable to export three different functions or one function with additional arguments specifying what version the user wants?

Tamas_Papp · August 24, 2020, 3:22pm

Separate functionality should go into separate functions. But you need at most 2.

You can have a function that returns an object that supports the iteration and abstract array protocols.

And then maybe a convenience function that just collects over this.

danielw2904 · August 24, 2020, 4:09pm

Thanks for the input! The question is then in what order the operations should be performed. The “laziest” option would be to return the iterator and if the user calls getindex index the rows and return the appropriate row. This would make reading the file fast and the first getindex unexpectedly slow. Or break it up into multiple steps

file = File("path/to.jsonl")
file[1] # error
iterate(file, 1) # return first row
index!(file)
file[1] # return firstrow

In any case there are two costly operations: Indexing the rows and parsing the strings (rows). The main idea is to be able to defer both until needed.

Tamas_Papp · August 25, 2020, 7:57am

I am only mildly familiar with the format, but if you need to find line breaks sequentially anyway, then a random access API makes little sense. Just support iteration.

Topic		Replies	Views
Initial version of my first package: A JSON Lines reader Package Announcements package	42	3072	November 16, 2020
[ANN] JSONLines v.2.0.0 Package Announcements	0	371	September 1, 2020
[ANN] JSONLines.jl ~1.0.0 and preview to 1.1.0~ 1.2.0 and preview to 1.3.0 Package Announcements package , announcement	1	520	August 21, 2020
Package to read/process lines without new allocations Package Announcements question , package , announcement	13	1022	May 5, 2023
Eachline function and more than one line New to Julia io	4	148	April 20, 2025

Package interface best practice: More functions or more arguments?

Related topics