How is the data ecosystem right now for large datasets?

Of course you will always be stuck with having to deal with dates and strings in some way, but you are still free to choose how you represent this data in memory or on disk. For example, in Julia, dates and times are represented by integers (do DateTime().instant.periods.value) and indeed, at the end of the day everything is an integer, but the usual approach is to keep them wrapped in DateTime objects when they sit in a dataframe. This is the approach I’m starting to question. Perhaps instead of storing these in a Vector{DateTime} we should store them as a Vector{Int} with metadata that tells it to convert to DateTime only when appropriate.

This might sound silly (and I’m certainly not committed to this idea, I’ve just been tossing it around), but when one considers that ultimately all the data has to go into some sort of analysis that only understands integers and floats anyway, one wonders whether DateTime is appropriate as a wrapper for stored data or whether it is merely an interface for presenting data to humans. Similar arguments can be made for strings since these almost always represent objects that can be mapped to the integers.

As an example, you’re talking about datestamps for events: how would we have dealt with time if we encountered it in HEP? It’s a float (we may have to use integers here because of precision issues). There’d never be any question about it because everyone knows that time is represented by real numbers. Perhaps it would behoove us not to forget this fact even when someone tells us a date.