Thank you, it works now
Upgraded the MNWE above to a MWE.
The problem is that the values in both dataframes can be missing, not only in the second one.
By the way, is it possible to see those rows, that break the cardinality assumption, in the error message?
Yes, I see… You can use by_pred(f_L, !isdisjoint, f_R)
where f_L
and f_R
create these 0-or-Inf intervals.
This isn’t as efficient as possible with a dedicated implementation, but should work already.
The ideal interface seems to be by_pred(:col_L, isequal_with_missing_wildcard, :col_R)
, probably with a nicer name (: This could directly generalize to isequal_with_wildcard(nothing)
or isequal_with_wildcard(NaN)
…
Btw, is a function such as isequal_with_missing_wildcard
already defined somewhere?
Not that I know about. I myself defined a function
equal_missing(x,y) = any(ismissing, (x,y)) ? true : x==y
And used it inside the predicate.
Ideal would be to add some keyword that would modify the behaviour of all by_key
to do this kind of check. Writing up by_pred
for every variable with missing values that you want to join on is too cumbersome.
I agree, maybe by_key(f_L, f_R, isequal=isequal_with_wildcard)
…
It should be, for convenience – just isn’t implemented.
What I typically do after getting a cardinality error in
innerjoin((;L, R), by_..., cardinality=(1, 1))
, is run
@p outerjoin((;L, R), by_..., groupby=:L) |> filter(length(_.R) != 1)
(+ same in the other direction, groupby=:R
)
and explore the result.