How to force single column array?

Hey guys, I want to make an array with only a single column, so ie. something like 4x1 Array{Int32,1}

Whenever I use zeros or undef I get like this:

zeros(Int32,4,1)
4×1 Array{Int32,2}:
 0
 0
 0
 0

Can I force the output to be a single array? Please note I know that it is possible to do:

zeros(Int32,(4,))
4-element Array{Int32,1}:
 0
 0
 0
 0

But I cannot use this notation since I am reading some data and depending on the datatype it might have 3 or 1 columns, but I do not know number of elements (rows) initially, so I need another way.

Hope it makes sense, else I can try to further clarify.

Kind regards

If you want to convert a vector to a single-column matrix, consider something like

julia> reshape(1:3, :, 1)
3×1 reshape(::UnitRange{Int64}, 3, 1) with eltype Int64:
 1
 2
 3

So it seems like there is no other way to do it? Do you happen to know why Julia chose to do it like this?

There are lot of other ways to do this, of course (eg you can always write code for this). This is just a simple one that does not copy.

I am not sure I understand your question. There may be some context I am missing here — if this solution is not what you are looking for, please clarify.

It was regarding why I had to write “(4,)” and not “(4,1)” to get a 4 by 1 array :slight_smile: I am trying to preallocate an array where I know the number of columns before hand but I do not know the number of rows before reading the data. In Matlab I would be able to make it easily by saying “zeros(4,1)” but in Julia I can’t do it the same way, so I am looking for another way to do it.

I just hoped there was a way to force the zeros function in Julia to behave as I want.

Kind regards

I’m not quite sure what the difficulty is - you can get the same output as Matlab with zeros(4,1) -

julia> zeros(4,1)
4×1 Array{Float64,2}:
 0.0
 0.0
 0.0
 0.0

or with Matlab

>> zeros(4,1)

ans =

     0
     0
     0
     0
1 Like
julia> zeros(4, 1)
4×1 Array{Float64,2}:
 0.0
 0.0
 0.0
 0.0

:confused:

1 Like

Haha I must be terrible at explaining today :slight_smile:

@dawbarton I want it to be an Array{Float64,1}

I figured out my own solution:

function dimMaker(nRow,nCol)
    if nCol==1
        dim = (nRow,)
    elseif nCol==3
        dim = (nRow,nCol)
    end
    return dim
end

Then I use dim to make the zeros array.
Thanks for being patient with me.

Kind regards

It is confusing that you are using the term ‘4x1 array’ when talking about a vector. A 4x1 array is a matrix. If you want a flat vector, that is not a 4x1 array, but rather a ‘length 4 vector’. It’s easier if you use the words ‘matrix’ and ‘vector’ instead of calling everything ‘arrays’.

Firstly, note that you can construct a zero-valued vector like this:

zeros(4)

You don’t need zeros((4,)).

Now, you have discovered one way of accomplishing what you wanted, but it would be easier if we knew what your inputs are. Where does nCol come from? If you have an input array x (either vector or matrix) then you can write:

zeros(size(x))

or

zero(x)

If, on the other hand, x is always a matrix, but sometimes has just one column, then you have use some sort of if condition. Converting a 4x1 matrix to a vector is most easily done with

vec(x)

You should however consider whether this is the right thing to do. Perhaps using a 4x1 matrix is more appropriate in your case.

Edit: Perhaps your confusion stems from Matlab. Matlab does not have the concept of vectors, everything is a matrix, so you can only get 4x1 matrices when you want a length 4 vector. Even single, scalar, numbers are 1x1 matrices, unlike in Julia, where Matrix, Vector and Number are different types.

4 Likes

I would also add that you might want to consider why you want a Vector in some circumstances but a Matrix in others. In the past I’ve found that making this distinction can make life harder unnecessarily, particularly when trying to write generic code. For example, I’ve ended up with bits of code like

# get the 3rd data point
if x isa Vector
    y = x[3]
else
    y = x[3, :]
end

when actually it would have been much easier to leave it as a 1×N Matrix and have the indexing work the same for both.

1 Like

That is right, I’ve been used to call everything arrays, but will try to remember the distinction in the future. And you are absolutely right, I could just do:

zeros(Float64,5)

And then get what I wanted. Thanks for making me aware of it.

@dawbarton I am working with post-processing of particle simulations, so let us say I have N particles. These particles can have properties like position, velocity, density and so on. Position and velocity is described with cartesian coordinates, so basically they would need to be described with a Nx3 matrix, while density is would only need a vector or a matrix of Nx1. You are right it makes generic code a bit harder to do, currently I overcome these problems using multiple dispatch like this:

function _transferDataBi4(ft::IOStream, arrayVal::AbstractMatrix)
    typ = eltype(arrayVal)
    sz = size(arrayVal)
    if !eof(ft)
        for i = 1:sz[1]
            for k = 1:sz[2]
                @inbounds arrayVal[i,k] = read(ft, typ)
            end
        end
    end
end

function _transferDataBi4(ft::IOStream, arrayVal::AbstractVector)
    typ = eltype(arrayVal)
    sz = length(arrayVal)
    for i = 1:sz
        if !eof(ft)
            @inbounds arrayVal[i] = read(ft, typ)
        end
    end
end

So depending on whether or not my array is a matrix or a vector my transferdata function uses the correct method to change the values of the preallocated array when reading data from binary files.

I might be a bit “stuck in my ways”, I just like having the arrays exactly as they should be mathematically.

Kind regards

You should be able to merge those two methods into one by using eachindex and a single loop.

1 Like

Thanks, I am trying it now but I am having some difficulties. Suppose I have a fictive matrix like this:

A = [a b c; d e f; g h i]

When I use eachindex natively it is filling a d f, then b e g and then c f i, while I want it to fill a b c, d e f and g h i. Can you point me in the right direction?

EDIT: maybe this does not make sense to do this way if Julia is column major order?

Kind regards

You should also consider representing each particle as a static vector (SVector) inside a vector of particles. That way you are working with vectors for everything, Vector{Float64} for the densities, and Vector{SVector{3, Float64}} for the (3D) positions. Using static vectors also has the convenience of having the same memory layout as a matrix and so you can reinterpret to convert between the representations without having to make copies. (They are also fast - usually much faster than slicing - if you are doing any particle position calculations.)

Edit: in this case your code would become

_readel(ft::IOStream, typ) = read(ft, typ)
_readel(ft::IOStream, typ::Type{<:SVector{3, T}}) where T = SVector(_readel(ft, T), _readel(ft, T), _readel(ft, T)) 

function _transferDataBi4(ft::IOStream, arrayVal::AbstractVector)
    typ = eltype(arrayVal)
    sz = length(arrayVal)
    for i in eachindex(arrayVal)
        if !eof(ft)
            arrayVal[i] = _readel(ft, typ)
        end
    end
end

(Or something like this - I haven’t tested it!)

6 Likes

Thanks for the suggestion! I read through the StaticArray manual and so far as I could understand it is best suited for array with less than 100 elements? If that is the case it is not viable for this project, but will keep it in mind for the future - once again thanks.

Kind regards

You misunderstand. It’s 100 elements per static vector. This would only be 3 elements and extremely efficient. It doesn’t matter how many of those 3-element svectors you push into the outer vector.

You should definitely try this approach! It will almost certainly be much faster than your current code, and also make more sense than a matrix.

4 Likes

Thanks for clearing it up - I will try to see if it improves my speeds then. Currently I am reading simulation data from an external particle simulator program, in which number of particles sometimes vary by time, sometimes it does not, but will give it a shot.

It doesn’t matter if the number of particles vary, as long as the number of spatial dimensions is constant.

1 Like

Thanks for clearing up my confusion, really learning a lot here.

I am trying to implement the code @dawbarton wrote in his reply ie:

_readel(ft::IOStream, typ) = read(ft, typ)
_readel(ft::IOStream, typ::Type{<:SVector{3, T}}) where T = SVector(_readel(ft, T), _readel(ft, T), _readel(ft, T)) 

function _transferDataBi4(ft::IOStream, arrayVal::AbstractVector)
    typ = eltype(arrayVal)
    sz = length(arrayVal)
    for i in eachindex(arrayVal)
        if !eof(ft)
            arrayVal[i] = _readel(ft, typ)
        end
    end
end

So it is working great for the density property and getting the result as expected:

a[1]
125751-element Array{Float32,1}:
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
 1000.0
...

This makes sense because it is running on the normal array format as before. I am having trouble with the SVector format since I do not know how to preallocate it properly. My confusion stems from that in the end I want a N by 3 matrix, but the example code I am using is only for vectors. Currently in my code I am preallocating like this:

j[i] = zeros(catTypeBi4[typ], dim)

Where catTypeBi4[typ] is either Float32 or Int32, while dim in the case of rhop is (N,) or in the case of velocity is (N,3). I really want to do this the static array way, so feedback would be much appreciated. I will continue trying. The error in the original code is stated as such:

 @time readBi4Array(Vel)
ERROR: MethodError: no method matching _transferDataBi4(::IOStream, ::Array{Float32,2})
Closest candidates are:
  _transferDataBi4(::IOStream, ::AbstractArray{T,1} where T) at REPL[66]:2
Stacktrace:
 [1] readBi4Array(::Cat, ::Bool) at .\REPL[70]:26
 [2] readBi4Array(::Cat) at .\REPL[70]:2
 [3] top-level scope at util.jl:156

Kind regards

Here is an example -

using StaticArrays
using BenchmarkTools

_readel(ft::IO, typ) = read(ft, typ)
@inline _readel(ft::IO, typ::Type{<:SVector{3, T}}) where T = SVector{3, T}(_readel(ft, T), _readel(ft, T), _readel(ft, T)) 

function _transferDataBi4(ft::IO, arrayVal::AbstractVector)
    typ = eltype(arrayVal)
    if !eof(ft)
        for i in eachindex(arrayVal)
            arrayVal[i] = _readel(ft, typ)
        end
    end
end

typeMaker(typ, nCol) = nCol == 1 ? typ : SVector{nCol, typ}

nRow = 1_000_000
data = IOBuffer(rand(UInt8, 3*sizeof(Int32)*nRow))

T = typeMaker(Int32, 3)  # typeMaker(catTypeBi4[typ], nCol)
partvec = zeros(T, nRow)

@benchmark _transferDataBi4(d, $partvec) setup=(d=seekstart(data)) evals=1

T = typeMaker(Int32, 1)  # typeMaker(catTypeBi4[typ], nCol)
partvec = zeros(T, nRow)

@benchmark _transferDataBi4(d, $partvec) setup=(d=seekstart(data)) evals=1

Note that I had to add an @inline to increase the performance; I’d do this very sparingly (if at all).

2 Likes