Slicing a Vector into a Matrix using views

Hi,

I loaded data from a simple file input.txt:

 1
 2
 3
 4
 5
 6
 7
 8
 9

into a vector with:

data = CSV.File("input.txt", ntasks=1, header=false).Column1

So it’s equivalent to:

julia> data = [1,2,3,4,5,6,7,8,9]
9-element Vector{Int64}:
 1
 2
 3
 4
 5
 6
 7
 8
 9

Now I would like to transform it into a fixed number of groups, e.g. r = 3, stored in a Matrix this way:

1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9

I could fill a new Matrix with loops, but for performance reason, I’d prefer not to copy the values, so I guess using @view would be better.
I don’t know what would be a clever way to do that.

Making your own AbstractArray types in Julia is beautifully easy:

struct Shift1Matrix{T, V<:AbstractVector{T}} <: AbstractMatrix{T}
    v::V # data
    sz1::Int # height
end

Base.size(x::Shift1Matrix) = (x.sz1, length(x.v) - x.sz1 + 1)
Base.getindex(x::Shift1Matrix, i::Int, j::Int) = x.v[i + j - 1]
julia> Shift1Matrix(1:9, 3)
3×7 Shift1Matrix{Int64, UnitRange{Int64}}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9
13 Likes

For those of us not as smart as mikmoore, an alternative:

using LazyStack
using ShiftedArrays: circshift as SA_circshift

data = [1,2,3,4,5,6,7,8,9]
n = length(data)
r = 3
m = transpose(lazystack(view(SA_circshift(data, i), 1:n-r+1) for i in 0:-1:-r+1))
1 Like

I wish I’d better know Julia…

Thanks everyone !

l=length(data)
rsp(l,r)=[i:r+i-1 for i in 1:l-r+1]
v=rsp(l,3)

stack(view.([data],v))
using BenchmarkTools
using LazyStack
julia> @btime LazyStack.stack(view.($[data],$v))
  47.017 ns (1 allocation: 336 bytes)
3×7 stack(::Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}) with eltype Int64:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9
1 Like

@rocco_sprmnt21, it seems simpler to write like this:

using LazyStack
data = collect(1:9)
n = length(data)
r = 3
lazystack(view(data, i:r+i-1) for i in 1:n-r+1)

# result:
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9
1 Like

There doesn’t seem to be a lazystack function in my system.
However, these are the results they get with the following variations:

julia> @btime Base.stack(view.($[data],$v))
  116.936 ns (2 allocations: 560 bytes)
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9

julia> @btime LazyStack.stack(view.($[data],$v))
  69.176 ns (1 allocation: 336 bytes)
3×7 stack(::Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}) with eltype Int64:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9

julia> @btime LazyStack.stack(view($data, i:r+i-1) for i in 1:n-r+1)
  2.760 μs (38 allocations: 1.31 KiB)
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9
julia> using IterTools
julia> @btime Base.stack(partition($data,3,1))
  44.332 ns (1 allocation: 224 bytes)
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9


LazyStack.stack does not work with this type of iterator

julia> @btime LazyStack.stack(partition($data,3,1))
ERROR: MethodError: no method matching stack(::IterTools.Partition{Vector{Int64}, 3, 1})
You may have intended to import Base.stack
2 Likes

I have used LazyStack v0.1.3, and lazystack() is really the first example provided in the package page GitHub - mcabbott/LazyStack.jl: 🥞

But now I am confused because this page shows v0.1.1 as the current version… @mcabbott ?

Anyway, your solution using Base stack() and IterTools.jl partition() seems to be the winning combo.

it is certainly very flexible (being able to choose as parameters the number of rows and the shift) and easy to use.
Maybe it would be even better if lazystack were to put together the pieces of the partition.

Honestly, I wouldn’t be surprised if such views performed worse than copying the data. Copies are surprisingly cheap. And accessing views — especially views that use non-range indices — can be surprisingly expensive. The tradeoff will depend upon how you use the view.

Profile!

It’s also worth noting that you can just index directly with a matrix of indices. Or broadcast. And with broadcasting, you can possibly fuse with downstream operations, too.

julia> data = [1,2,3,4,5,6,7,8,9];

julia> r = 3;

julia> data[(begin:begin+r-1) .+ (0:end-r)']
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9

julia> getindex.((data,), (1:r) .+ (0:9-r)')
3×7 Matrix{Int64}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9
5 Likes

There’s also:

julia> using ToeplitzMatrices, FFTW

julia> Hankel(1:9, (3,7))
3×7 Hankel{Int64, UnitRange{Int64}, Tuple{Int64, Int64}}:
 1  2  3  4  5  6  7
 2  3  4  5  6  7  8
 3  4  5  6  7  8  9

Not only does this provide a memory-efficient representation, but it also provides some specialized linear algebra routines.

2 Likes

Why do you need FFTW ?

You don’t. I quickly browsed the readme for ToeplitzMatrices.jl and it looked like it was useful, but on closer inspection that only applies to circulant matrices. Will edit my post.

On even closer inspection: multiplying by a large Toeplitz (or Hankel) matrix is done using an FFT. So even though. FFTW is not needed for the small example, it is likely to be useful in practice.