Where “Idp_vec” is the vector holdning the index value of each particle starting from 0 to N, while “Rhop_vec” holds the corresponding density for each particle. So for an example for the first element I would have Idp = 0 and Rhop = 1007.7992.

Now comes the tricky part. Sometimes I might only want to look at particles from the range of Idp of 0 to 1500. Since the Idp_vec is clearly unstructured I have two options:

Sort Idp_vec from lowest to highest and then enforce same sorting on Rhop_vec

Find the specific indices directly and correlate with Rhop_vec

I have no clue which one is fastest / most efficient, but I have a hard time figuring out how to for an example implement option 1 efficiently. Currently what I am doing is the same as in the documentation, using sortperm to save an array of the indices after sorting so I can transform the Rhop_vec array correspondingly, but wondered if there is a smarter way.

For the second option, I don’t know if it is viable, but I think it might be more efficient since I do not have to make any temporary arrays?

I am finishing up my own solution, but would like to hear about your past experiences.

There’s certainly a O(N) solution, if the number of particle is known in advance:
create a new vector Rhop_aux = Vector{Float32}(undef, maximum(Idp_vec))
fill it as Rhop_aux[Idp[i]+1] = Rhop[i]
now, Rhop_aux[i] is the Rhop whose corresponding Idp is i-1.

A sorted index with a binary search for the lookup would work, but if you want to do the search often using the same data, sorting the actual values might be faster. Maybe it can be sorted in a tree structure for faster lookup.

function readBi4Body(Body,typ)
start = getfield(Body, :beg)+1 #First idp
move = start+getfield(Body, :count)-1 #Number of particles from first idp
@time idp_vec = readBi4Array(Idp)
@time val_vec = readBi4Array(typ)
j = similar(val_vec)
for i = 1:length(pq)
j[i] = val_vec[i][sortperm(idp_vec[i])][start:move]
end
return j
end

Ignore the start, move fields. Basically I am reading my idp data and my rhop data (in val_vec), creating a new array similar to for an example rhop_vec in j and then looping through to make sure that the values of val_vec move according to the sortperm of idp_vec and I am getting results as I want.

Regarding speed it is taking me 0.65 seconds to load data in (using readBi4Array) and then about 0.1 seconds for the sorting I asked about here as seen in the @time:

@time k = readBi4Body(Bodies[2][1],Points);
0.146138 seconds (6.30 k allocations: 119.800 MiB, 29.87% gc time)
0.532087 seconds (8.85 k allocations: 370.412 MiB, 40.51% gc time)
0.756527 seconds (157.40 k allocations: 509.116 MiB, 34.27% gc time)

I might test Vasily’s solution in the future but for now I think this is the best way for me.

Thank you once again for your suggestions, hope they might help others in the future as well.

I would say that this is the best solution in most cases.

If this is a bottleneck in your code, what is optimal depends on how many times you do lookups for one pair of vectors, ie whether making them faster is worth the cost of sorting.

It takes about 3 seconds in total to go through 1 GB of files (earlier benchmarks were a bit wrong), so for now it is not a performance bottleneck, but testing bigger data sets might show another picture. Thanks for your comment.

In theory it should be possible to wrap your two arrays into a light structure, pass it to sort! and get both Arrays sorted at the same time. You would just need to overload get/setindex an <. I gave it a quick try but I cannot add a method to isnan for some reason. I’m also not sure that solution would be optimal (we might have some copying going on).

struct CoSort{T,N} <: AbstractArray{T,N}
x::Array{T,N}
y::Array{T,N}
end
Base.size(x::CoSort) = size(x.x)
Base.getindex(x::CoSort,i) = (x.x[i], x.y[i])
function Base.setindex!(x::CoSort, v, i)
x.x[i] = v[1]
x.y[i] = v[2]
end
Base.Sort.isnan(o::Base.Order.Ordering, x::Tuple{T,T}) where T = isnan(x[1]) || isnan(x[2])
x = CoSort(rand(3), rand(3))
sort!(x)
>ERROR: MethodError: no method matching isnan(::Base.Order.ForwardOrdering, ::Tuple{Float64,Float64})

Sorting both arrays at the same time, seems like a waste when I can sort one, and provide its indices to the other array, or did I misunderstand your procedure?

My idea is that the second array would just passively get sorted as the first one gets sorted (< would depend only on the first array). But it’s true that if elements gets moved several times it might be slower than using indices, so it’s probably a bad idea for some sorting algorithms.

aux_vec = []
while length(aux_vec) != length(vec_to_be_sorted)
greater_value = Inf
position = 1
for v in vec_to_be_sorted
if greater_value > based_vec[v] && !(v ∈ aux_vec )
greater_value = based_vec[v]
position = v
end
end
push!(aux_vec, position)
end
vec_to_be_sorted = aux_vec