I’m interested in a current overview of simple expressions often come up, but which tend to be more verbose or difficult than they could theoretically be. These things are scattered around many threads here and there but I think an overview would be cool. It interests me from a language design point of view what trade-offs are made and what becomes easy or difficult in the process. Please share your own, without deviating too much into discussing each one. That would make the examples less visible.
I’ll start:
Map returning multiple arrays
I often need something like this hypothetical multimap function:
ones, twos = multimap(1:100) do i
(1, 2)
end
Instead, I think you have to mess around with zip, or create one array first and then deconstruct it, or something like that.
Conditional application of functions
I often do this
result = some_condition ? value : some_function(value)
and if there’s two steps or more involved it’s annoying to repeat value. I don’t even have a suggestion for a syntax that would be nice but the idea is always “some value with f applied, but only if some condition holds” and this comes up constantly.
Map with skipped results
I like map but often I wish I could easily exclude some return values. I guess you can do something with reduce or just filter afterwards, but my intuitive wish would be to use continue like in a for-loop.
filtered_results = map(values) do v
# computations...
# then skip values where some condition doesn't hold
some_condition && continue
return result
end
I am not sure about this — a lot of seemingly “simple” and “obvious” constructs are not implemented because they are actually difficult to implement or integrate into Julia.
I often want to search target values in a collection, like with searchsorted etc., but the collection is not sorted, so I have to use findall, findfirst, etc., which does not accept directly the target value (except for true).
When this is done repeatedly in the same code, I find myself writing a wrapper like:
The multimap issue is something I spent a lot of time on, I often returns named tuples now and I need to unpack them afterwards. Maybe the unzip solution isn’t so bad though :
unzip(x) = ntuple(i -> [x[i] for x in x], length(x[1]))
a,b = (i -> (1,2)).(1:100) |> unzip
About findfirst, I find it annoying that is doesn’t accept generators :
findfirst(x>1 for x in 1:5) #wrong
findfirst([x>1 for x in 1:5]) #right :(
It’s also hard to remember which functions accept a generators and which do not (maybe there’s a good reason behind it but seems a bit arbitrary).
Why would it be supposed to accept generators? x>1 for x in 1:5 creates only one argument (an iterator), and findfirst needs two (a function and an iterator).
That works but I think the one argument form (taking a collection as input) is more common and convenient, specially since you’ll often use comprehensions in that kind of code, so it’s very natural to copy the inside of the comprehension and put it in another function.
Maybe the issue is that generators are not enumerable in the right way, e.g. maximum works but not argmax.
For generators that produce Bools, this can be implemented in a very straightforward way.
# This would actually work for any iterator, except that it
# violates arbitrary indexing, for which `pairs` is used instead
function Base.findfirst(gen::Base.Generator)
T = Base.@default_eltype(gen)
@assert T == Bool "findfirst can't iterate over a generator of non-booleans. Got return type $T"
for (i, x) in enumerate(gen)
x && return i
end
end
The reason this doesn’t already “just work” is actually because generators don’t implement pairs (or keys) which is the method find* relies on. Since generators don’t have getindex defined on them anyhow, I don’t think it’s a problem to use enumerate.
Actually, defining the following in Base makes findfirst and findlast work as expected, without the above definition:
Base.pairs(gen::Base.Generator) = enumerate(gen)
julia> findlast(x>2 for x in 1:5)
5
julia> findfirst(x>2 for x in 1:5)
3
findnext is harder, since it requires indexing, but it can likely be done as well.
The dimensionality issue when taking sum or mean. Writing dropdims(sum(x, dims=2), dims=2) is so cumbersome. I wish sum would just take a kwarg to drop dimensions instead.