DataFrames version of R's which() function

TLDR; is there a Julia version of R’s which() function?

I’m rewriting some code from R to Julia. I’ve gotten comfortable with the DataFrames subset function but in addition to returning rows I occassionally need to access relative to a specific row that meets a given condition. My data records a series of simulated aircraft flights. They always start at a given position but vary in end point and duration. To find the beginning of each run I subset the data as follows:

@subset(df,:XPOSITION.== df.XPOSITION[1])

I’d like to be able to get the last row of each run as well. In R I would use the which() function which returns a vector of indices in the original dataframe that match the given condition. Then I know that the end positions abutt the start positions and I can shift the correct elements of the start position vector to get the end position vector.

Is there a Julia equivelant to R’s which()?

You can do

@subset(df, :XPOSITION .== :XPOSITION[1])

which works just fine (and will be faster)

I think you may want to use a grouped data frame? This might not be the most elegant solution, maybe someone else has an answer

julia> df = DataFrame(flight = [1, 1, 1, 1, 2, 2, 2, 2], location = [1, 2, 3, 2, 43, 44, 43, 42])
8×2 DataFrame
 Row │ flight  location
     │ Int64   Int64
─────┼──────────────────
   1 │      1         1
   2 │      1         2
   3 │      1         3
   4 │      1         2
   5 │      2        43
   6 │      2        44
   7 │      2        43
   8 │      2        42

julia> @chain df begin
           groupby(:flight)
           combine(_) do sdf
               sdf[[1, nrow(sdf)], :]
           end
       end
4×2 DataFrame
 Row │ flight  location
     │ Int64   Int64
─────┼──────────────────
   1 │      1         1
   2 │      1         2
   3 │      2        43
   4 │      2        42

If each data frame is one flight, it’s just

df[[1, nrow(df), :]

I can’t think of a great way to write it with DataFramesMeta.jl verbs, to be honest. Maybe someone else will think of something.

To just get the indices, I would recommend using @with rom DataFramesMeta.jl

@with df :XPOSITION .== :XPOSITION[1]

Some R code you use would also be helpful

So at the moment my solution is

row = 1:length(df.XPOSITION)
df.row = row
startRows = @subset(df, :XPOSITION.== :XPOSITION[1])
startIndex = startRows.row
endIndex = vcat(startIndex[2:end].-1,last(df.row))

This works but adding in a column of row indices when my data is 6.2+ million rows long seems like a really inefficient way of getting the indices I need.

Unfortunately the data does not have the indivdual flights labeled. That is what I’m going to use the start and end index lists to do.

I’m not sure what this last line is doing. But it sounds like you need the function findall. You could do, for example

@with df findall(==(:XPOSITION[1]), :XPOSITION)

Yes, I think @rwalters31 is looking for findall.

@pdeffebach & @bkamins findall is the function I was hunting for. Thanks!

To get the index of the last row of I flight I know that it is the index one prior to the start of the next flight so for example if startIndex = [1,10,17,25] then the end index for flight 1 would be 9 (the index prior to the start of flight 2) and so forth so in this case endIndex = [9,16,24,nrow(df)].

1 Like