Package interface best practice: More functions or more arguments?

For my package JSONLines.jl I am considering a refactoring and provide 3 options to read a JSONLines file:

  1. Iterator over an mmaped file. Basically returns the mmaped file at first and implements an iterator that produces the next row on each interation (can be parsed or returned as Vector{UInt8})
  2. Index of an mmaped file. Mmaps the file and iterates over it once saving the indices for the newlines such that rows can be accesed via getindex (can be parsed or returned as Vector{UInt8}).
  3. Read and parse the whole file.

Would it be prefereable to export three different functions or one function with additional arguments specifying what version the user wants?

Separate functionality should go into separate functions. But you need at most 2.

You can have a function that returns an object that supports the iteration and abstract array protocols.

And then maybe a convenience function that just collects over this.

1 Like

Thanks for the input! The question is then in what order the operations should be performed. The “laziest” option would be to return the iterator and if the user calls getindex index the rows and return the appropriate row. This would make reading the file fast and the first getindex unexpectedly slow. Or break it up into multiple steps

file = File("path/to.jsonl")
file[1] # error
iterate(file, 1) # return first row
index!(file)
file[1] # return firstrow

In any case there are two costly operations: Indexing the rows and parsing the strings (rows). The main idea is to be able to defer both until needed.

I am only mildly familiar with the format, but if you need to find line breaks sequentially anyway, then a random access API makes little sense. Just support iteration.

1 Like