Can I get this syntax to work using zip?

Hello!

Suppose I have two points defined as:

p1 = (1.0,1.0)
p2 = (1.0,1.025)

Then I want to calculate the pair-wise distance between the two as such:

for (i,j) in zip(p1,p2)
    diff = i - j
end

And I want the output to be a tuple the same size as p1 and p2, with in this case values of:

(0.0,0.025)

I am aware this simple example can be done using .. I am specificially asking if this could work as I want using zip :slight_smile:

Kind regards

The obvious way to write it with zip doesn’t in fact return a tuple, because zip doesn’t. You could fix it by defining your own one. Or by just using the fact that map works like zip on multiple arguments:

julia> map(zip((1.0, 2.0), (3.0, 5.0))) do (i,j)
         diff = i - j
       end
2-element Vector{Float64}:
 -2.0
 -3.0

julia> _zip(ts::Tuple...) = ntuple(i -> map(t -> t[i], ts), minimum(length, ts));

julia> map(_zip((1.0, 2.0), (3.0, 5.0))) do (i,j)
          i - j  # diff
       end
(-2.0, -3.0)

julia> map((1.0, 2.0), (3.0, 5.0)) do i,j
         i - j
       end
(-2.0, -3.0)
4 Likes

Thanks, that works for me! It is perhaps better to just use map in this case, yes :slight_smile:

I have one question though, which is pretty basic, how would I assign the end result of map to something? Like I would usually do for a = 1 etc., I can’t seem to get it to work when using map.

If I put what you have wirtten in a function though it spits out what I want:

function pairwise_dist(p1,p2)
    map(p1,p2) do i,j
        diff = i - j
    end
    
end

So I am just asking to learn

Kind regards

Writing diff = i - j inside the function body here is just a note to yourself, probably not best practice. To assign what the whole map expression returns, you need a = map(...) even if there’s a do involved. (Every expression in Julia returns something, so you will also see things like b = if x<0 ... else ... end.)

I renamed diff to d and I see, it worked - thanks!

The following looks nice too:

dif = Tuple(i-j for (i,j) in zip(p1,p2))
1 Like

Thanks! map was more performant for me, so will stick to that, but that shows it is possible.

Kind regards

1 Like

Input:

a = (1.0, 2.0)
b = (3.0, 5.0)
b .- a

Output:

(2.0, 3.0)

Input:

using BenchmarkTools
@btime $(Ref(b))[] .- $(Ref(a))[]

Output:

  1.400 ns (0 allocations: 0 bytes)
(2.0, 3.0)

Note OP’s premise:

I’m sorry for missing it.

We can type-stably create a tuple using map and zip by doing the following, but it will be less efficient because it goes through a Vector.

using BenchmarkTools

f(a::NTuple{N}, b::NTuple{N}) where N =
    NTuple{N}(map(((i, j),) -> j - i, zip(a, b)))

a = (1.0, 2.0)
b = (3.0, 5.0)
@btime f($a, $b)

Output:

 30.885 ns (1 allocation: 96 bytes)
(2.0, 3.0)

The following similar code using zip is more efficient (but too complex than b .- a).

g(a::NTuple{N}, b::NTuple{N}) where N =
    NTuple{N}(j - i for (i, j) in zip(a, b))

@btime g($(Ref(a))[], $(Ref(b))[])

Output:

  1.300 ns (0 allocations: 0 bytes)
(2.0, 3.0)

Using @code_native, I have found that the following three functions generate the same native code for a = (1.0, 2.0) and b = (2.0, 5.0):

F(a, b) = b .- a

g(a::NTuple{N}, b::NTuple{N}) where N =
    NTuple{N}(j - i for (i, j) in zip(a, b))

h(a::NTuple{N}, b::NTuple{N}) where N =
    ntuple(i -> b[i] - a[i], N)

Conclusion: Using g(a, b), we can do exactly the same as b .- a using zip.

2 Likes

I’m not sure if I missed something but this naive thing should be really fast:

p1 = (1.0,1.0)
p2 = (1.0,1.025)
pairwise_dist(p1,p2) = ntuple(i -> p1[i]-p2[i], length(p1))

@benchmark pairwise_dist($p1,$p2)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.001 ns … 0.100 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.001 ns             β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   0.028 ns Β± 0.044 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ
  β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‡ β–‚
  0.001 ns       Histogram: frequency by time        0.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Thanks for your answer. You have to benchmark like this:

@btime pairwise_dist($Ref(p1)[],$Ref(p2)[])
  163.787 ns (5 allocations: 160 bytes)
(0.0, -0.02499999999999991)

Since currently your benchmark is unrealistically fast, due to the compiler having realized that you do not use the value for anything (that is what I have been told atleast)

Kind regards

1 Like

I’m not sure if Ref is very important here, but my function is as fast as the a .- b thing, so it doesn’t really matter.

pairwise_dist(p1,p2) = ntuple(i -> p1[i]-p2[i], length(p1))

function test_Pairs1(arr)
    s = 0.0
    for ps in arr 
        p1, p2 = ps
        s += sum(pairwise_dist(p1,p2))
    end 
    s 
end

function test_Pairs2(arr)
    s = 0.0
    for ps in arr 
        p1, p2 = ps
        s += sum(p1 .- p2)
    end 
    s 
end

arr = [( (rand(),rand()), (rand(),rand()) ) for i in 1:10^6]

@benchmark test_Pairs1($arr)
@benchmark test_Pairs2($arr)

@benchmark test_Pairs1($arr)
BenchmarkTools.Trial: 2314 samples with 1 evaluation.
 Range (min … max):  1.894 ms …   3.209 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     2.113 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   2.144 ms Β± 114.584 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

              β–β–†β–ˆβ–ˆβ–†β–…β–„β–ƒβ–„β–β–
  β–‚β–β–‚β–‚β–‚β–‚β–β–‚β–β–‚β–‚β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–„β–„β–„β–ƒβ–ƒβ–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–ƒβ–‚β–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–„
  1.89 ms         Histogram: frequency by time        2.61 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

@benchmark test_Pairs2($arr)
BenchmarkTools.Trial: 2318 samples with 1 evaluation.
 Range (min … max):  1.901 ms …   3.543 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     2.111 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   2.140 ms Β± 115.493 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

             β–„β–†β–ˆβ–‡β–ˆβ–ˆβ–…β–…β–…β–‚β–„β–
  β–‚β–β–β–‚β–β–β–β–β–β–‚β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–…β–„β–ƒβ–ƒβ–ƒβ–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–ƒβ–‚β–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–„
  1.9 ms          Histogram: frequency by time        2.62 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

@Seif_Shebl, your function looks similar to @genkuroki’s, no?

The fast functions here should all compile down to the same thing. While nothing actually happens in 0.001 ns, getting that does tell you that it’s reduced to something very simple.

julia> @code_typed pairwise_dist_ntuple(p1, p2)
CodeInfo(
1 ─ %1 = Base.getfield(p1, 1, true)::Float64
β”‚   %2 = Base.getfield(p2, 1, true)::Float64
β”‚   %3 = Base.sub_float(%1, %2)::Float64
β”‚   %4 = Base.getfield(p1, 2, true)::Float64
β”‚   %5 = Base.getfield(p2, 2, true)::Float64
β”‚   %6 = Base.sub_float(%4, %5)::Float64
β”‚   %7 = Core.tuple(%3, %6)::Tuple{Float64, Float64}
└──      return %7
) => Tuple{Float64, Float64}

julia> @code_typed pairwise_dist_map(p1, p2)
CodeInfo(
1 ─ %1 = Base.getfield(p1, 1, true)::Float64
β”‚   %2 = Base.getfield(p2, 1, true)::Float64
β”‚   %3 = Base.sub_float(%1, %2)::Float64
β”‚   %4 = Base.getfield(p1, 2, true)::Float64
β”‚   %5 = Base.getfield(p2, 2, true)::Float64
β”‚   %6 = Base.sub_float(%4, %5)::Float64
β”‚   %7 = Core.tuple(%3, %6)::Tuple{Float64, Float64}
└──      return %7
) => Tuple{Float64, Float64}

julia> @code_typed g(p1, p2)  # NTuple(generator)
CodeInfo(
1 ─ %1 = Base.getfield(a, 1, true)::Float64
β”‚   %2 = Base.getfield(b, 1, true)::Float64
β”‚   %3 = Base.sub_float(%2, %1)::Float64
β”‚   %4 = Base.getfield(a, 2, true)::Float64
β”‚   %5 = Base.getfield(b, 2, true)::Float64
β”‚   %6 = Base.sub_float(%5, %4)::Float64
β”‚   %7 = Core.tuple(%3, %6)::Tuple{Float64, Float64}
└──      return %7
) => Tuple{Float64, Float64}