Shuffling only non-missing values

In trying to implement permutation functions in PopGen.jl, I came across a situation where I will need to shuffle values within some parameters (like genotypes across loci, or across loci across populations), and have to account for permuting only those values that are not missing (i.e. the missing values must remain in their original locations). For what it’s worth, all of this happens in the context of a DataFrames.jl dataframe using for i in groupby(df, :field). I originally had a function that was a bit heavy to do that:

function strict_shuffle!(x::T) where T <: AbstractArray
    # get indices of where original missing are
    miss_idx = findall(i -> i === missing, x)
    # make seperate array for non-missing values
    no_miss = skipmissing(x) |> collect |> Vector{Union{Missing, eltype(x)}}
    shuffle!(no_miss)
    # re-add missing to original positions
    @inbounds for i in miss_idx
        @inbounds insert!(no_miss, i, missing)
    end
    return no_miss
end

but after some trial and error I came up with this simple one-liner that does what I need, and has a much lighter footprint:

function strict_shuffle!(x::T) where T <: AbstractArray
    shuffle!(@view x[.!ismissing.(x)])
end

My concern is that this may violate the docstring information provided for @view:

    @view A[inds...]
  Creates a SubArray from an indexing expression. This can only be applied
directly to a  reference expression (e.g. @view A[1,2:end]), and should not 
be used as the target of an assignment (e.g. @view(A[1,2:end]) = ...). 
See also @views to switch an entire block of code to use views for slicing.

The Slack community user dpsanders suggested I post it to the discourse to get some feedback on this being legal or appropriate. Thank you!

Pavel

Hmmm, I am not sure if the documentation is referring to your case. I say that for two reasons mostly:

  1. I have written a code that made extensive use of shuffle! over @view (it was a ad hoc heuristic for 2D guillotine cutting). And the code worked as expected, XD.
  2. What I think the documentation is trying to prevent is you trying to use the view as a variable/binding/left-hand-side-expression. In fact, you can see below:
julia> a = collect(1:10);
julia> (@view a[1:5]) = collect(6:10)
ERROR: syntax: invalid assignment location "true && typeof(Base.view)()(a, 1:5)"
...
julia> (@view a[1:5]) .= collect(6:10)
5-element view(::Array{Int64,1}, 1:5) with eltype Int64:
  6
  7
  8
  9
 10
julia> a
10-element Array{Int64,1}:
  6
  7
  8
  9
 10
  6
  7
  8
  9
 10

The advice is just: do not confuse a view object with a binding/‘variable name’, replace view elements not the view itself.

1 Like

Ah, thank you for the information!