Dealing with NaN's

rafael.guerra · April 27, 2021, 4:17pm

@sijo, the example provided had 10 rows for which:

julia> @btime filter(AsTable(:) => row->!any(isnan(x) for x in row), $df);
  5.400 μs (48 allocations: 4.34 KiB)

julia> @btime filter(row->!any(isnan(x) for x in row), $df);
  3.575 μs (41 allocations: 2.17 KiB)

julia> @btime filter((:) => (values...)->!any(isnan.(values)), $df);
  11.100 μs (59 allocations: 2.73 KiB)

sijo · April 27, 2021, 4:59pm

But for such small data frames performance doesn’t really matter?

I just realized the data frame has abstract columns (Real). Here are timings with Float64 columns, for many rows:

julia> df = DataFrame(x=rand((-1.0,1.0,NaN),1000000), y=rand((-2.0,2.0,NaN),1000000), z=rand((-3.0,3.0,NaN),1000000));

julia> @btime filter(AsTable(:) => row->!any(isnan(x) for x in row), $df);
  16.694 ms (52 allocations: 32.06 MiB)

julia> @btime filter(row->!any(isnan(x) for x in row), $df);
  449.876 ms (10556281 allocations: 202.47 MiB)

julia> @btime filter((:) => (values...)->!any(isnan.(values)), $df);
  4.534 ms (29 allocations: 9.17 MiB)

And for a smaller dataset with many columns (here the compilation time is much longer):

julia> df = DataFrame([rand((-1.0,1.0,NaN)) for _ in 1:10, _ in 1:1000], :auto);

julia> @btime filter(AsTable(:) => row->!any(isnan(x) for x in row), $df);
  1.299 ms (12607 allocations: 1.16 MiB)

julia> @btime filter(row->!any(isnan(x) for x in row), $df);
  190.694 μs (2081 allocations: 145.98 KiB)

julia> @btime filter((:) => (values...)->!any(isnan.(values)), $df);
[compilation takes too long, I didn't wait]

Topic		Replies	Views
How to filter out rows with NaN in specific fields? New to Julia dataframes	2	3415	October 24, 2019
DataFrames: How to remove rows containing NaNs when there are also missings General Usage dataframes	9	2359	November 28, 2024
Finding values and deleting rows in a DataFrame General Usage	2	261	December 31, 2020
Replacing missing and NaN values in dataframe New to Julia question , dataframes , missing-values	6	4037	March 29, 2022
DataFrame can't get filter to get rid of rows where a column contains a blank " " New to Julia dataframes	8	1072	February 19, 2022

Dealing with NaN's

Related topics