Date.value doesn't work with missing values

I have a dataframe with some Dates in it, and I want a new column or vector with the number of days between those dates. So this works:

vcat([missing], [y - x for (x, y) in zip(df[:,:date][1:end], df[:,:date][2:end])])

The first value is missing because there’s no previous date. I can add that to my dataframe and all seems well.

Now I want to do arithmetic with those. It seems I need to use Date.value on those to turn them into integers/numbers, I was expecting that the missing value would be propagated and remain missing, but I get:

MethodError: no method matching value(::Missing)
etc etc etc

For now, I’m just wrapping this in a simple expression:

if ismissing(d) missing else Dates.value(d)

but it seems like Date.value should handle that. Is this just something that hasn’t been implemented, or is there something about the semantics of missing and such that I am, well, missing? Is there a more idiomatic way to handle this?

BTW, as a bit of context: I’m experimenting with using a Pluto notebook and dataframes as a replacement for spreadsheets. I track things like our monthly natural gas and electricity usage in a spreadsheet, and I’d like to see if a Pluto notebook might work better.

I’m populating my data with DataFrame and CSV and it’s great – I use a string with dates like “2023-09-15” in it, and getting my date values with:

file = CSV.File(IOBuffer(csv))
df = DataFrame(file)
1 Like

This is standard behaviour for most functions and what the passmissing function wrapper is for, i.e. you want

julia> passmissing(Dates.value).([now(), missing])
2-element Vector{Union{Missing, Int64}}:
 63830463604704
               missing
1 Like

Excellent! Thanks. I was sure there was something less awkward than my if-else expression.

Am I right that the syntax here is a typical way of applying a function to every element of a vector/array (I’m not clear on the distinction in Julia): that is, you have myfunc that takes a single number, and I want to map that over [2,3,5,7], I would do

myfunc.([2,3,5,7])

? I know I can use a list comprehension, and there’s the actual map, but it seems the above is common.

Yes, see e.g. More Dots: Syntactic Loop Fusion in Julia

Can’t the arithmetic be done on the Date objects? For example,

julia> Date(2022,1,1)-Date(2021,1,1)
365 days

works.

As for,

wouldn’t you want to drop the first record altogether in this case?

Thirdly,

[y - x for (x, y) in zip(df[:,:date][1:end], df[:,:date][2:end])]

can also be obtained using diff(df.date).

Yes, but the resulting thing doesn’t work with numbers — if I do Date(2022,1,1)-Date(2021,1,1), and then try to divide, it doesn’t work:

julia> 123/(Date(2022,1,1)-Date(2021,1,1))
ERROR: MethodError: no method matching /(::Int64, ::Day)

That would work, but I don’t want that, for two reasons: first, I don’t want to ignore data; if I have some data from my gas bill or whatever, I’d like that in my dataframe. Second, there are other values where this special case of “missing in the first row, and populated otherwise” doesn’t hold, so more generally I need good strategies – such as passmissing – to handle missing values.

Thanks for letting me know about diff.

Good point. Here is another way around this conundrum, which keeps the semantics and even adds more semantics:

julia> using Dates

julia> using Unitful

julia> 123 / Quantity(Date(2022,1,1)-Date(2021,1,1))
0.336986301369863 d^-1

If 123 above has a unit as well (such as electrical energy or cooking gas mass), then the result can even contain these units.

In case dropping the first record feels like ignoring data, I’m not so sure, as the calculation tries to get rates of consumption, and using the initial row would mean somehow guessing a previous period for its data. This can be done in case of utilities and hoousehold consumption, and can also be done explicitly by adding a guessed previous record (which would become the dropped initial record). This may sound convoluted, but adding missing often just means it will need to be specially addressed from then on, until it is eliminated.

But tastes, differ and many approaches are legitimate.