What is the difference between map(v -> f(v), lst)
and [f(v) for v in lst]
?
I have seen this discussion but the focus was on speed and memory allocation.
Conceptually, there is no difference. Note that you can simply write
map(f, lst)
if f
is a previously-defined function.
Why are there two versions?
They are not quite the same. For example, with list comprehension you can additionally filter elements:
[x for x in 1:10 if x % 2 == 0]
But map
may be shorter and more convenient, e.g.:
[f(input) for input in inputs]
map(f, inputs)
In addition, map
(mostly) preserves type of a collection, while list comprehencion doesn’t. Compare:
map(x -> x + 1, Set([1, 2, 3]))
[x + 1 for x in Set([1, 2, 3])]
I still having a hard time with all this (Julia v0.5).
Comprehension on an Array{Tuple}
is fine
julia> a=[(1,2),(3,4)];
julia> [x for (x,y) in a]
2-element Array{Int64,1}:
but not map
julia> map((x,y)->x, a)
ERROR: MethodError: no method matching (::##3#4)(::Tuple{Int64,Int64})
even though
(x,y) = a[1]
is fine.
Also filter
raises an error on
julia> filter((x,y)->x==3, a)
ERROR: MethodError: no method matching (::##5#6)(::Tuple{Int64,Int64})
because a
is not an associative collection (two arguments are passed to the function in this case: this is specified in the manual). Indeed filter
works here
julia> filter((x,y)->x==3, Dict(a))
Dict{Int64,Int64} with 1 entry:
Then again neither one is valid:
julia> foreach((x,y)->println(x), a)
ERROR: MethodError: no method matching (::##9#10)(::Pair{Int64,Int64})
julia> foreach((x,y)->println(x), Dict(a))
ERROR: MethodError: no method matching (::##11#12)(::Pair{Int64,Int64})
Instead
julia> for (x,y) in a
println(x)
end
is OK.
Then I come across this:
julia> filter((x,y)->begin println(typeof(x)); x[1]==3; end, Dict(a))
Int64
Int64
Dict{Int64,Int64} with 1 entry:
but it should be an error because Int64
has no getindex
.
I am very confused, but there must have been a good reason to have it this way and I cannot see it. How can I picture all this in a more systematic way?
(And all this because of this post.)
julia> a = Int64(6); a[1]
6
Rather unexpected…
I suspect (but don’t actually know) that one reason for Int64
to provide a getindex
implementation is to make broadcast
work in a general way.
The key difference between loops/comprehensions and the anonymous functions used in filter
, map
etc. seems to be the implicit tuple destructuring that only happens in the former case (which makes sense for dispatch).
Imho the outlier here is filter
for associative iterables, also see
https://github.com/JuliaLang/julia/issues/17886
So apart from that, the system appears to be consistent: In loops, comprehensions and assignments you get automatic tuple destructuring if you want it, and otherwise you don’t.
No, broadcast
doesn’t need this (in 0.6, it works on arbitrary “scalar” types that don’t have getindex
).
I think that largely this is the Matlab legacy; in Matlab, numbers are “really” 1x1 matrices internally, and it is quite common to write functions that are supposed to work on either scalars or arrays of numbers in order to vectorize. To simplify the process of writing such generic scalar/vector code, you can access numbers as if they were 0-dimensional arrays in Julia.
I think that a lot of the need for this should be gone now with 0.5’s dot-call syntax: in the cases where you would previously have written a generic vector/scalar function, you should now just write the scalar function f(x)
, and then apply it to arrays A
with f.(A)
. This is not only easier, it is also faster because it can fuse with other elementwise operations and the result can be assigned in-place with .=
.
See also: make numbers non-iterable? · Issue #7903 · JuliaLang/julia · GitHub
Oh, nice explanation, thank you. The implicit tuple destructuring is the bit I was missing.
Should be:
map((elem) -> elem[1], object)
since the elements of the object are tuples and you want to select the first element of the tuple.
Note that there is a function first
:
julia 0.6> first((1,2))
1
So you can just write
julia 0.6> a = [(1, 2), (3, 4)]
2-element Array{Tuple{Int64,Int64},1}:
(1, 2)
(3, 4)
julia 0.6> first.(a)
2-element Array{Int64,1}:
1
3
Is there a reason for this behavior?
By definition, list comprehension builds a list. You could possibly have a kind of “collection comprehension” that tries to preserve collection type. But I can see a little value for it and a number of hard design choices, e.g. what syntax this feature should have, how to do type dispatching (which is a solved issue for map
in Julia), how to handle filtering in general collections (i.e. [x for x in xs if condition(x)]
for lists), etc.
might make more sense to call it array comprehension then, since it understands shape.
julia> A = rand(2,3)
2Ă—3 Array{Float64,2}:
0.05249 0.251237 0.911031
0.461673 0.73201 0.854654
julia> [a^2 for a in A]
2Ă—3 Array{Float64,2}:
0.0027552 0.0631202 0.829977
0.213142 0.535838 0.730434
We don’t call them list comprehensions nor do we call the data structure lists – that’s Python terminology. Julia’s random access n-dimensional data type is an array and the comprehensions that construct them are array comprehensions.
map does not preserve the type of an Array
in v0.6.
Julia v0.6
julia> typeof(map(identity,Any[1,2,3]))
Array{Int64,1}
Julia v0.5
julia> typeof(map(identity,Any[1,2,3]))
Array{Any,1}
Operating on an Array
, map
in v0.6 appears to return an array of the least common (non-proper) supertype of the elements.
This is one of the thousand cuts Symata.jl
has suffered under v0.6. (Not that I’m complaining, I knew the API was in flux.)
Would pre-allocation solve that problem? In that case, it is explicit that you are persevering type:
x = Any[1,2,3]
y = similar(x)
map!(identity,y,x)
Yes, preallocation solves the problem, or my problem, at any rate. This was relatively easy to fix once I discovered the origin of the bad behavior.
how is that a bad behavior though? you want map to treat identity
as special case?