Why `map` requires `collect` to use for general iterable things?

Why map requires to collect for general iterable things?

julia> foreach(println, "あいうえお")
あ
い
う
え
お

julia> map(println, "あいうえお");
あ
ERROR: ArgumentError: map(f, s::AbstractString) requires f to return AbstractChar; try map(f, collect(s)) or a comprehension instead
Stacktrace:
 [1] map(f::typeof(println), s::String)
   @ Base ./strings/basic.jl:655
 [2] top-level scope
   @ REPL[12]:1

julia> map(println, collect("あいうえお"));
あ
い
う
え
お

1 Like

You probably want to use foreach instead.

map returns a new object of the same type as the one being mapped over. Mapping a vector returns a vector. Mapping a string returns a string. Since the elements of a string are char, the function must return char.

By calling collect you instead turn the string into a vector of chars, and so the function can return anything because you can put anything in a vector.

3 Likes

So map (map(f, collection_input) -> collection_output) expects f to be f: T \to T for general collection_input?

1 Like

Not sure what version of Julia you are using but with the improved error messages in more recent versions I get

ERROR: ArgumentError: map(f, s::AbstractString)
requires f to return AbstractChar; try
map(f, collect(s)) or a comprehension instead

I believe the motivation for this is so that the types returned by map can be reasoned about, ie if f: A → B, and t is an iterable of type T and element type A, then map(f, t): T{A} → T{B}.

This isn’t possible for arbitrary functions applied to strings as strings by definition can only be collections of Chars.

2 Likes

Not in general, no:

julia> map(x -> 1/x, Int[1 2 3])
1×3 Matrix{Float64}:
 1.0  0.5  0.333333

It looks like f is required to return an AbstractChar when mapping over an AbstractString though, since it generally tries to return an AbstractString again? Not quite sure why this is, the error was originally added in better error for map(f,String) where f doesn't return Char · JuliaLang/julia@f08ba8d · GitHub

EDIT: I’ve opened an issue to see if this can be relaxed `map(f, ::String)` needs `f` to return an `AbstractChar` · Issue #54580 · JuliaLang/julia · GitHub

1 Like

Also, perhaps this is a better example for mapping over arbitrary iterables:

julia> i = Iterators.filter(iseven, 1:20)
Base.Iterators.Filter{typeof(iseven), UnitRange{Int64}}(iseven, 1:20)

julia> map(println, i)
2
4
6
8
10
12
14
16
18
20
10-element Vector{Nothing}:
 nothing
 nothing
 nothing
 nothing
 nothing
 nothing
 nothing
 nothing
 nothing
 nothing

As you can see, map returns a collection of the return values, even for arbitrary iterables. For some functions like println, that’s probably not what you want though, since it (needlessly) ends up allocating a Vector full of nothing.

I am only searching there are how many ways to do iteration thus there is no problem for vector of nothings.

this is a better example for mapping over arbitrary iterables:

I’m not sure what you are saying about. Using Iterator.filter?

Generally, is there no rule what kind of type of f is allowed in map?

Something like Iterators.filter is a more general example than String, is what I’m saying :slight_smile:

Yes, in general you can use any function that takes a single argument.

there’s no rule because Julia is not a language with function types. (i.e. we don’t dispatch on input and output types of a function at all)

As I understand it, the answer is that the semantics of map is inconsistent: Ideally, the purpose of map is to transform a collection into a different collection of the same type. This is also how I understand the docstring:

map(f, c…) → collection
Transform collection c by applying f to each element

This allows - in principle - mapping a string to a string without an intermediate array, mapping a UnitRange{Int} to another range without allocations, and other such optimizations. In some cases, this can be more efficient than lazy mapping.

The inconsistency lies in that Julia will fall back to returning Vector. I think that’s bad design, because it means that adding any new specializations to map, which preserves the intended semantics is a breaking change. IMO that fallback should never have been present.

If you want a more generic mapper, use Iterators.map.

7 Likes

where does the docstring say of the same type?

It doesn’t, so this is somewhat up to interpretation. But I’d say that if it was the intended semantics that it could change the collection type, it would not say Transform collection c by applying f to each element, since then, it would not just be a transformation by applying f to the individual elements, it would also transform the container itself.

Also, if that was not the intended semantics, it’s quite weird that map have specialized methods for Set, Dict, Tuple, String, BitArray and maybe more, which preserves the collection type. I mean, surely you will agree that the semantics of “returns an Array, except for this set of types” is weird.

It’s of course also possible that the intended semantics is that it makes no promises about the collection type, and could change.

1 Like

I think you’re reading too much into the wording:

many Base usage and tests all rely on mutating the container type (here Range to Vector).

That docstring was written a loong time ago, I wouldn’t interpret it as particularly intended by anyone.

Although the tests were also from a long time ago, so who’s to say

1 Like

This peculiarity is specific to AbstractString only. It is because it was desirable for map to return an AbstractString when the input was one, to avoid having to always do String(map(f, str)) when working with strings. Since the user can still ask for a different iterable (e.g. a Vector{Char}, with collect), this was/is fine.
See how the container type changes in these simple examples without issue.

julia> map(x -> x + 1, BitVector([1, 0, 1])')
1×3 adjoint(::Vector{Int64}) with eltype Int64:
 2  1  2

julia> map(x -> x + 1.0, [1, 2, 3]')
1×3 adjoint(::Vector{Float64}) with eltype Float64:
 2.0  3.0  4.0

julia> map(x -> x + 'a', [1, 2, 3])
3-element Vector{Char}:
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)

The other two exceptions are Dict and Set, because there is not yet any consensus on what the return type should be. See map for Dict · Issue #5794 · JuliaLang/julia · GitHub for some history on those two cases. The String method has been around since the very beginning, so definitely won’t be changing.

Arguably (although I’m not arguing this), improvements that have taken place in the language since then make it so that you could detect if f(::Char) returns Char or not-Char, and generate parts of the function body based on that in a type stable way.

You’re correct for sure, however a relevant feature that’s been experimental for a few years already is the opaque closure:

julia> oc = Base.Experimental.@opaque (x::Int) -> sqrt(x)
(::Int64)::Float64->◌

julia> typeof(oc)
Core.OpaqueClosure{Tuple{Int64}, Float64}
1 Like

Not quite no.
opaque closures are not really for that.
though some experiments have used them for that.
I would not suggest normal user use them at all right now.
They have other weird behaviors like not participating in world ages.

5 Likes

What I wondered is Iterators.filter makes an iterator which can be directly used by map but it is not generic and collect seems superior.

julia> using BenchmarkTools

julia> @btime map(x -> x * x, Iterators.filter(x -> true, "あいうえお"))
  524.471 ns (19 allocations: 472 bytes)
5-element Vector{String}:
 "ああ"
 "いい"
 "うう"
 "ええ"
 "おお"

julia> @btime map(x -> x * x, collect("あいうえお"))
  308.947 ns (17 allocations: 456 bytes)
5-element Vector{String}:
 "ああ"
 "いい"
 "うう"
 "ええ"
 "おお" 

What I mean is that String has a specialized method for map, and that Iterators.filter does not. Thus, Iterators.filter is a more generic example for use in map than String. I didn’t mean that Iterators.filter is a generic iterator, just that it’s an example for something that can’t be indexed like String or Array.

That depends on the usecase - collect eagerly allocates a new array, which is not always possible/desirable. For example, for a Channel (which can be iterated over), you wouldn’t want to call collect because the Channel may not yet have all data available. If you call collect to create an array, you’ll effectively block your entire computation until the Channel is closed, which could happen hours later (or sometimes never!).

Ok, I understand. Your example is one case of arbitrary iterators (like not random-accessable, not on-memory, not fully-available-at-once).

1 Like