Slow deletion loop on huge DataFrame

Kryppi · November 28, 2022, 10:44pm

I want to delete certain elements in an ordered DataFrame dim=(1.5 million,14). If the conditions for deletion are met, the code to check if the next element is also up for deletion (if the string in [x,1] is identical) looks like this:

Code

while df[k-l,1]==df[k-l+1,1]
deleteat!(df,k-l+1)
l=l+1;
k=k+1;
if k==upperBoundary-1
break
end
end
deleteat!(df,k-l)
k+=1;
l+=1;
end

I checked with println that this part takes about 1 s (was faster for me than looking up the @time syntax again, my bad).
This isn’t scalable.
Suggestions?

Edit: My indents are gone. Sorry

kristoffer.carlsson · November 28, 2022, 11:03pm

Try calculating all the indices you want to delete and do the deletion in one call.

Kryppi · November 29, 2022, 9:49pm

job was done within seconds.

Topic		Replies	Views
How to speed up the for-loop with dataframe access Performance dataframes	25	1158	April 14, 2022
Delete row from DataFrame in place based on entire row value New to Julia question , dataframes	7	611	April 4, 2023
How to delete rows in DataFrame? New to Julia question , dataframes	4	3670	September 12, 2022
Issorted for Dataframe rows Performance	5	357	September 8, 2020
How to delete several rows from a dataframe in one command New to Julia dataframes	2	321	March 25, 2023

Slow deletion loop on huge DataFrame

Related topics