DataStreams interface


#1

For DataStreams Source’s and Sink’s, do all the methods listed in the documentation (linked below) need to be implemented? For instance, for a source that is sequential, reset! is often meaningless.

http://juliadata.github.io/DataStreams.jl/stable/index.html#Data.Source-Interface-1


#2

All the interface methods should list whether they’re required or not; definitely open an issue if something’s not clear. In terms of Data.reset!, it certainly can be useful for sequential sources; take CSV.Source, for example; it holds an internal IO object that represents the underlying csv file. It also marks the datapos, which is the byte offset in the file where the actual table data starts, so Data.reset! is defined simply as seek(source.io, source.datapos).

Another case is SQLite.Source: the data must be accessed row-by-row, so it’s not RandomAccess, but it also can be Data.reset! because it just involves resetting the result cursor back to row 1 of the resultset.


#3

Thanks. I’ll open an issue in github. I didn’t mean to imply that reset! is never useful (though perhaps my words “often meaningless” were a bit strong). My point is more that they are sometimes not useful. I have in mind a true “streaming” data set such as trade and quote data being broadcast over the network. There is usually no way to seek on this type of data. Anyway, further discussion can take place in github.


#4

I have another question that I wouldn’t call an “issue”: it says in the documentation “Packages can have a single julia type implement both the Data.Source and Data.Sink interfaces, or two separate types can implement them separately.”. My understanding is that such types should inherit from the abstract types but that julia doesn’t have multiple inheritance so I don’t understand how a type can inherit from both.

The example I have in mind is a type that takes a stream and filters it to return a lower bandwidth stream.