Map vs list comprehension

What is the difference between map(v -> f(v), lst) and [f(v) for v in lst]?
I have seen this discussion but the focus was on speed and memory allocation.

2 Likes

Conceptually, there is no difference. Note that you can simply write

map(f, lst)

if f is a previously-defined function.

2 Likes

Why are there two versions?

1 Like

They are not quite the same. For example, with list comprehension you can additionally filter elements:

[x for x in 1:10 if x % 2 == 0]

But map may be shorter and more convenient, e.g.:

[f(input) for input in inputs]
map(f, inputs)

In addition, map (mostly) preserves type of a collection, while list comprehencion doesn’t. Compare:

map(x -> x + 1, Set([1, 2, 3]))
[x + 1 for x in Set([1, 2, 3])]
12 Likes

I still having a hard time with all this (Julia v0.5).

Comprehension on an Array{Tuple} is fine

julia> a=[(1,2),(3,4)];
julia> [x for (x,y) in a]
2-element Array{Int64,1}:

but not map

julia> map((x,y)->x, a)
ERROR: MethodError: no method matching (::##3#4)(::Tuple{Int64,Int64})

even though

(x,y) = a[1]

is fine.

Also filter raises an error on

julia> filter((x,y)->x==3, a)
ERROR: MethodError: no method matching (::##5#6)(::Tuple{Int64,Int64})

because a is not an associative collection (two arguments are passed to the function in this case: this is specified in the manual). Indeed filter works here

julia> filter((x,y)->x==3, Dict(a))
Dict{Int64,Int64} with 1 entry:

Then again neither one is valid:

julia> foreach((x,y)->println(x), a)
ERROR: MethodError: no method matching (::##9#10)(::Pair{Int64,Int64})
julia> foreach((x,y)->println(x), Dict(a))
ERROR: MethodError: no method matching (::##11#12)(::Pair{Int64,Int64})

Instead

julia> for (x,y) in a
       println(x)
       end

is OK.

Then I come across this:

julia> filter((x,y)->begin println(typeof(x)); x[1]==3; end, Dict(a))
Int64
Int64
Dict{Int64,Int64} with 1 entry:

but it should be an error because Int64 has no getindex.

I am very confused, but there must have been a good reason to have it this way and I cannot see it. How can I picture all this in a more systematic way?

(And all this because of this post.)

julia> a = Int64(6); a[1]
6

Rather unexpected…

I suspect (but don’t actually know) that one reason for Int64 to provide a getindex implementation is to make broadcast work in a general way.

1 Like

The key difference between loops/comprehensions and the anonymous functions used in filter, map etc. seems to be the implicit tuple destructuring that only happens in the former case (which makes sense for dispatch).

Imho the outlier here is filter for associative iterables, also see
https://github.com/JuliaLang/julia/issues/17886

So apart from that, the system appears to be consistent: In loops, comprehensions and assignments you get automatic tuple destructuring if you want it, and otherwise you don’t.

1 Like

No, broadcast doesn’t need this (in 0.6, it works on arbitrary “scalar” types that don’t have getindex).

I think that largely this is the Matlab legacy; in Matlab, numbers are “really” 1x1 matrices internally, and it is quite common to write functions that are supposed to work on either scalars or arrays of numbers in order to vectorize. To simplify the process of writing such generic scalar/vector code, you can access numbers as if they were 0-dimensional arrays in Julia.

I think that a lot of the need for this should be gone now with 0.5’s dot-call syntax: in the cases where you would previously have written a generic vector/scalar function, you should now just write the scalar function f(x), and then apply it to arrays A with f.(A). This is not only easier, it is also faster because it can fuse with other elementwise operations and the result can be assigned in-place with .=.

See also: make numbers non-iterable? · Issue #7903 · JuliaLang/julia · GitHub

7 Likes

Oh, nice explanation, thank you. The implicit tuple destructuring is the bit I was missing.

Should be:

map((elem) -> elem[1], object)

since the elements of the object are tuples and you want to select the first element of the tuple.

Note that there is a function first:

julia 0.6> first((1,2))
1

So you can just write

julia 0.6> a = [(1, 2), (3, 4)]
2-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (3, 4)

julia 0.6> first.(a)
2-element Array{Int64,1}:
 1
 3
5 Likes

Is there a reason for this behavior?

1 Like

By definition, list comprehension builds a list. You could possibly have a kind of “collection comprehension” that tries to preserve collection type. But I can see a little value for it and a number of hard design choices, e.g. what syntax this feature should have, how to do type dispatching (which is a solved issue for map in Julia), how to handle filtering in general collections (i.e. [x for x in xs if condition(x)] for lists), etc.

might make more sense to call it array comprehension then, since it understands shape.

julia> A = rand(2,3)
2Ă—3 Array{Float64,2}:
 0.05249   0.251237  0.911031
 0.461673  0.73201   0.854654

julia> [a^2 for a in A]
2Ă—3 Array{Float64,2}:
 0.0027552  0.0631202  0.829977
 0.213142   0.535838   0.730434
8 Likes

We don’t call them list comprehensions nor do we call the data structure lists – that’s Python terminology. Julia’s random access n-dimensional data type is an array and the comprehensions that construct them are array comprehensions.

13 Likes

map does not preserve the type of an Array in v0.6.

Julia v0.6

julia> typeof(map(identity,Any[1,2,3]))
Array{Int64,1}

Julia v0.5

julia> typeof(map(identity,Any[1,2,3]))
Array{Any,1}

Operating on an Array, map in v0.6 appears to return an array of the least common (non-proper) supertype of the elements.

This is one of the thousand cuts Symata.jl has suffered under v0.6. (Not that I’m complaining, I knew the API was in flux.)

1 Like

Would pre-allocation solve that problem? In that case, it is explicit that you are persevering type:

 x = Any[1,2,3]
 y = similar(x)
 map!(identity,y,x)

Yes, preallocation solves the problem, or my problem, at any rate. This was relatively easy to fix once I discovered the origin of the bad behavior.

how is that a bad behavior though? you want map to treat identity as special case?