Half Vectorization

slakeon · November 30, 2017, 3:12am

Is there an algorithm that can do half vectorization, the vech (here) or generates elimination of duplication matrices (here)? I’m trying to recover only the unique elements of a covariance matrix in vector form.

stevengj · November 30, 2017, 4:22am

Sure, the algorithm is: write a loop. (Or rather, two nested loops are probably the easiest way to implement vech.)

GunnarFarneback · November 30, 2017, 10:13am

If compact code is more important than performance you can do A[tril(trues(A))]. Otherwise the loop solution can easily be implemented by a comprehension:
[A[i, j] for i = 1:size(A, 1), j = 1:size(A, 2) if i >= j]

stevengj · November 30, 2017, 1:14pm

Actually, A[tril(trues(A))] is significantly faster than the comprehension on my machine.

Explicit loops are even better, about 50x faster than this comprehension on my machine:

vech0(A) = A[tril(trues(A))]
vech1(A) = [A[i, j] for i = 1:size(A, 1), j = 1:size(A, 2) if i >= j]

function vech(A::AbstractMatrix{T}) where T
    m = LinAlg.checksquare(A)
    v = Vector{T}((m*(m+1))>>1)
    k = 0
    for j = 1:m, i = j:m
        @inbounds v[k += 1] = A[i,j]
    end
    return v
end

B = rand(1000,1000); B = B + B';
using BenchmarkTools, Compat
@btime vech0($B); @btime vech1($B); @btime vech($B);

gives

  893.148 μs (16 allocations: 4.06 MiB)
  21.599 ms (24 allocations: 5.00 MiB)
  385.942 μs (2 allocations: 3.82 MiB)

on my machine with Julia 0.6.

I knew loops would be faster than comprehensions, but I’m honestly surprised that the difference is so large. The comprehension version does multiple allocations (comprehensions with a filter, like i >= j, have to grow the array as they go along), loops over the entire array, and is not type-stable according to @code_warntype. Still, it seems like filtered comprehensions could faster…

stevengj · November 30, 2017, 1:20pm

In Julia 0.7 on my machine, things are somewhat better:

  vech0:   1.200 ms (9 allocations: 4.06 MiB)
  vech1:   5.514 ms (23 allocations: 5.00 MiB)
  vech:    377.758 μs (2 allocations: 3.82 MiB)

Even so, the factor of 14 difference is larger than I would have initially thought.

GunnarFarneback · November 30, 2017, 1:22pm

For completeness,

i = ceil(Int, sqrt(2 * n + 0.25) - 0.5)
j = n - i * (i - 1)  ÷ 2

gives an enumeration of the lower triangle indices.

GunnarFarneback · November 30, 2017, 1:56pm

Right, that’s what you get for guessing at performance with insufficient insight. It should have been obvious that it would require pretty advanced compiler optimizations to correctly predict the size of the filtered comprehension and avoid growing and other overhead.

slakeon · November 30, 2017, 6:46pm

Thank you all! These work great!

reinhardhansen · September 18, 2020, 6:55pm

This code will not run with Julia 1.5.
vech0 will run with the modification: trues(A) → trues(size(A))
vech will not run. (I did make the change LinAlg.checksquare → LinearAlgebra.checksquare)

ERROR: LoadError: MethodError: no method matching Array{Float64,1}(::Int64)
Closest candidates are:
  Array{Float64,1}() where T at boot.jl:425
  Array{Float64,1}(::UndefInitializer, ::Int64) where T at boot.jl:406
  Array{Float64,1}(::UndefInitializer, ::Int64...) where {T, N} at boot.jl:412

What else must be modified?

vech0(A) = A[tril(trues(size(A)))]
vech1(A) = [A[i, j] for i = 1:size(A, 1), j = 1:size(A, 2) if i >= j]

function vech(A::AbstractMatrix{T}) where T
    m = LinearAlgebra.checksquare(A)
    v = Vector{T}((m*(m+1))>>1)
    k = 0
    for j = 1:m, i = j:m
        @inbounds v[k += 1] = A[i,j]
    end
    return v
end

B = rand(1000,1000); B = B + B';
using BenchmarkTools, Compat
@btime vech0($B); @btime vech1($B); @btime vech($B);

jling · September 18, 2020, 7:01pm

it’s erroring about this line, I think it should be

Vector{T}(undef, (m*(m+1))>>1)

? to initialize

reinhardhansen · September 18, 2020, 7:07pm

Thank you! That solved it. Below are the times in Julia 1.5 on my laptop, in case someone is interested.

537.001 μs (8 allocations: 4.06 MiB)
  4.250 ms (20 allocations: 5.00 MiB)
  304.750 μs (2 allocations: 3.82 MiB)

Topic		Replies	Views
How to efficiently create a vector from a matrix comprehensions? General Usage	5	818	November 7, 2016
Comprehensions and vectorization New to Julia	13	4933	September 20, 2017
[ANN] VectorizationTransformations.jl Package Announcements package , announcement , linearalgebra	2	508	December 26, 2023
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1657	July 30, 2020
Fast performance of array comprehension without allocations Performance	14	1234	July 28, 2021

Half Vectorization

Related topics