Regarding filtering/creating dataframe

Alec_Loudenback · March 12, 2019, 9:48pm

Hello,

I have a dataframe where I want to filter and grab a subset of rows. In this example, I want to grab the rows for each of :B where :C is the smallest for :B:

df.A = rand(10)
df.B = repeat(["A","B","C","D","E"],2)
df.C = [1,2,3,1,2,3,1,2,3,1]
df

	A	B	C
	Float64	String	Int64
1	0.376752	A	1
2	0.416886	B	2
3	0.371288	C	3
4	0.645706	D	1
5	0.925196	E	2
6	0.708123	A	3
7	0.118586	B	1
8	0.723375	C	2
9	0.243351	D	3
10	0.775513	E	1

And I want:

	A	B	C
	Float64	String	Int64
1	0.376752	A	1
2	0.118586	B	1
3	0.723375	C	2
4	0.645706	D	1
5	0.775513	E	1

I tried creating an empty DataFrame to push!() to, but you can’t do that without specifying the column types first. My MVE is simple here, but I didn’t find it documented how I could create a new DataFrame based on the structure of an existing one without copying and deleting all the rows.

What would be a recommended technique here?

ElOceanografo · March 12, 2019, 10:19pm

Here’s one way using the basic split-apply-combine methods for DataFrames:

df2 = by(df, :B) do subdf
    i = argmin(subdf.C)
    (A = subdf.A[i], C = subdf.C[i])
end

This rearranges the column order, but you can set it back by doing df2 = df2[[:A, :B, :C]] (if it matters to you).

davidanthoff · March 13, 2019, 3:05am

Here is the Query.jl way of doing this:

df |> @groupby(_.B) |> @map(_[argmin(_.C)]) |> DataFrame

Note that the _ in the @map stands for each group. Each group here is a table, so first we extract the C column from that table with _.C, find the index of the smallest element in that column with argmin, and then index into the table to extract that full row with _[row_number]. So for each group @map returns one row, and then in turn constitutes a new table, which we materialize into a DataFrame.

If we had this the @map part could be written as @map(minimum(_, i->i.C)), which would be a bit more elegant.

nalimilan · March 13, 2019, 2:42pm

You can use similar(df, 0) to create a data frame with the same columns as df. With this PR it will even be possible to push rows to an empty data frame and have new columns be created automatically.

But in general the solutions proposed in other comments are probably better.

Topic		Replies	Views
Dataframe Filter New to Julia question , dataframes	6	5403	March 26, 2022
DataFrame Filtering New to Julia question , dataframes	10	898	September 20, 2023
DataFramesMeta custom filter: by groups of A, apply filter on B Data	4	1408	May 27, 2019
How to sample rows from a dataframe? New to Julia	2	2818	January 18, 2019
What is the recommended way to filter rows of a Dataframe? Performance dataframes	6	218	July 23, 2024

Regarding filtering/creating dataframe

Related topics