map returns a new object of the same type as the one being mapped over. Mapping a vector returns a vector. Mapping a string returns a string. Since the elements of a string are char, the function must return char.
By calling collect you instead turn the string into a vector of chars, and so the function can return anything because you can put anything in a vector.
Not sure what version of Julia you are using but with the improved error messages in more recent versions I get
ERROR: ArgumentError: map(f, s::AbstractString)
requires f to return AbstractChar; try
map(f, collect(s)) or a comprehension instead
I believe the motivation for this is so that the types returned by map can be reasoned about, ie if f: A → B, and t is an iterable of type T and element type A, then map(f, t): T{A} → T{B}.
This isn’t possible for arbitrary functions applied to strings as strings by definition can only be collections of Chars.
As you can see, map returns a collection of the return values, even for arbitrary iterables. For some functions like println, that’s probably not what you want though, since it (needlessly) ends up allocating a Vector full of nothing.
As I understand it, the answer is that the semantics of map is inconsistent: Ideally, the purpose of map is to transform a collection into a different collection of the same type. This is also how I understand the docstring:
map(f, c…) → collection
Transform collection c by applying f to each element
This allows - in principle - mapping a string to a string without an intermediate array, mapping a UnitRange{Int} to another range without allocations, and other such optimizations. In some cases, this can be more efficient than lazy mapping.
The inconsistency lies in that Julia will fall back to returning Vector. I think that’s bad design, because it means that adding any new specializations to map, which preserves the intended semantics is a breaking change. IMO that fallback should never have been present.
If you want a more generic mapper, use Iterators.map.
It doesn’t, so this is somewhat up to interpretation. But I’d say that if it was the intended semantics that it could change the collection type, it would not say Transform collection c by applying f to each element, since then, it would not just be a transformation by applying f to the individual elements, it would also transform the container itself.
Also, if that was not the intended semantics, it’s quite weird that map have specialized methods for Set, Dict, Tuple, String, BitArray and maybe more, which preserves the collection type. I mean, surely you will agree that the semantics of “returns an Array, except for this set of types” is weird.
It’s of course also possible that the intended semantics is that it makes no promises about the collection type, and could change.
This peculiarity is specific to AbstractString only. It is because it was desirable for map to return an AbstractString when the input was one, to avoid having to always do String(map(f, str)) when working with strings. Since the user can still ask for a different iterable (e.g. a Vector{Char}, with collect), this was/is fine.
See how the container type changes in these simple examples without issue.
Arguably (although I’m not arguing this), improvements that have taken place in the language since then make it so that you could detect if f(::Char) returns Char or not-Char, and generate parts of the function body based on that in a type stable way.
Not quite no.
opaque closures are not really for that.
though some experiments have used them for that.
I would not suggest normal user use them at all right now.
They have other weird behaviors like not participating in world ages.
What I mean is that String has a specialized method for map, and that Iterators.filter does not. Thus, Iterators.filter is a more generic example for use in map than String. I didn’t mean that Iterators.filter is a generic iterator, just that it’s an example for something that can’t be indexed like String or Array.
That depends on the usecase - collect eagerly allocates a new array, which is not always possible/desirable. For example, for a Channel (which can be iterated over), you wouldn’t want to call collect because the Channel may not yet have all data available. If you call collect to create an array, you’ll effectively block your entire computation until the Channel is closed, which could happen hours later (or sometimes never!).