Instead of smean or so, I’ve often thought that Julia could still use the question mark better. Currently, I think it’s only allowed for the ternary operator, but there it needs a space. So it could still be used as a unary operator. Or allowed as the ending for variable names in a future Julia version.
Maybe mean?(data) could be mean(skipmissing(data)) then. Or the missing-skipping version of mean. The thing is that this cannot be tried out in packages because current parsing rules don’t allow it.
Though I would prefer const sm = skipmissing and write the parentheses or even a trailing |>sm, the tab completion \neg<TAB> doesn’t save typing anyway.
I don’t think the parsing rules can be adapted to let mean? be shorthand for function composition without breaking. We do have predicate operators e.g. !isnan, but with the precedence as is, !isnan(x) actually does !(isnan(x)), so !isnan would need parentheses if not an argument in a higher order function. Better put the character in front of data.
Right that’s what I meant. The ? hasn’t gotten any meaning yet because nobody could decide on anything. Whether it should be something for nothing or for missing or for both. But at least it offers a possibility that could be explored.
Yeah, I think it was considered as an option where
f?(x)
would be equivalent to
passmissing(f)(x)
In that case, it’s not really meant for reductions like mean, sum, and cor, which are the functions that seem to ruffle the most feathers in this thread. But perhaps some overloading of broadcasting could make
f?.(x)
equivalent to
f(skipmissing(x))
In fact, we could probably implement this already with a Unicode binary operator…
EDIT: Maybe two separate Unicode operators would be better. Not sure that broadcasting ? has the right semantics.
EDIT #2: I wish we had more unary post-fix operators, like '. If you use a binary operator, then something like the following won’t work, because we don’t have frankentuples:
I think the rules would be something along the lines of
If f operates on scalars, then f?(x, y) would return missing if x or y are missing.
If f operates on collections, then f?(x) would omit missings from the collection x.
Of course, narrowing these rules and implementing them would be super hard. I wonder if a lack of strictness in the type hierarchy also contributes. Julia doesn’t have a strong notion of collection that could help make those rules clearer.
But it’s something DataFramesMeta.jl can think more about and experiment with, particularly because in the context of DataFrames the collection-scalar distinction is well identified with ByRow and the only “collection” is vectors.
Parsing and lowering know nothing about what f does, so it needs to be a syntactic transformation. So, two operators would probably be needed. Something along these lines:
If someone was so inspired, perhaps they could use what I did in OverflowContexts.jl to swap out the integer operator definitions.
You’d write a macro that swaps out all Missing-related operators/methods to have the permissive behaviors. Perhaps @default_skipmissings or something like that. Then you’d have a one-liner you could write and work the way you want. Would take a beat to recompile, but if you’re doing something interactive that is a fairly small one-time cost to avoid cluttering the code with guardrails for missings.
We might want to use such important syntax for something more general rather than the domain-specific passmissing. For example, error handling, as is done in other languages. We need to be really careful when spending valuable syntax resources.
That’s been the argument in the relevant GitHub issue for years now, which has prevented ? from ever being added for any use whatsoever.
True. Perhaps there’s another symbol out there that we can use for the second unary post-fix operator… Although I’m not sure it would be a dealbreaker.
And that’s not bad, if after some time we get a solution that also incorporates handling exceptions and other errors! Agree with @jar1.
While for missing handling specifically, a very simple solution suggested multiple times in this thread alone would already bring significant improvement to those struggling with the current situation. Just define smean or whatever functions in a missing-focused package!
One option with new syntax is to have the parser handle it gracefully, but not actually make it valid regular Julia syntax. Macros can then use that syntax for their domain specific use-case. That can sometimes strike a balance between using up scarce syntax options for a domain specific use-case and still enabling domain-specific short syntax options.
The handling of curely brackets is a nice prior example, they don’t do anything at the top level in regular Julia code at the moment, but macros can utilize them, which has been super useful both in Query and Vega/VegaLite for example.
is not as universally useful as one might hope. It works for the univariate reduction functions (mean, sum, etc) and for cor and var, but it generally doesn’t work for other multivariate function calls, like