In trying to implement permutation functions in PopGen.jl
, I came across a situation where I will need to shuffle values within some parameters (like genotypes across loci, or across loci across populations), and have to account for permuting only those values that are not missing (i.e. the missing values must remain in their original locations). For what it’s worth, all of this happens in the context of a DataFrames.jl
dataframe using for i in groupby(df, :field)
. I originally had a function that was a bit heavy to do that:
function strict_shuffle!(x::T) where T <: AbstractArray
# get indices of where original missing are
miss_idx = findall(i -> i === missing, x)
# make seperate array for non-missing values
no_miss = skipmissing(x) |> collect |> Vector{Union{Missing, eltype(x)}}
shuffle!(no_miss)
# re-add missing to original positions
@inbounds for i in miss_idx
@inbounds insert!(no_miss, i, missing)
end
return no_miss
end
but after some trial and error I came up with this simple one-liner that does what I need, and has a much lighter footprint:
function strict_shuffle!(x::T) where T <: AbstractArray
shuffle!(@view x[.!ismissing.(x)])
end
My concern is that this may violate the docstring information provided for @view
:
@view A[inds...]
Creates a SubArray from an indexing expression. This can only be applied
directly to a reference expression (e.g. @view A[1,2:end]), and should not
be used as the target of an assignment (e.g. @view(A[1,2:end]) = ...).
See also @views to switch an entire block of code to use views for slicing.
The Slack community user dpsanders
suggested I post it to the discourse to get some feedback on this being legal or appropriate. Thank you!
Pavel