What’s the best way to replace all NaN’s in a DataFrame with zero? I can write a nested for-loop and check every cell but I thought there may be a simpler way to do that…

Eg

```
using DataFrames
replace_nan(v) = map(x -> isnan(x) ? zero(x) : x, v)
df = DataFrame(a = [NaN, 2.0, 3.0], b = [4.0, 5.0, NaN])
df2 = map(replace_nan, eachcol(df))
```

Note that this version allocates new columns, so you may want to use `map!`

instead.

This only maps the last row… seems unintuitive to use `last`

here. Are you using a different version of Julia or DataFrame?

```
julia> df
3×2 DataFrames.DataFrame
│ Row │ a │ b │
├─────┼─────┼─────┤
│ 1 │ NaN │ 4.0 │
│ 2 │ 2.0 │ 5.0 │
│ 3 │ 3.0 │ NaN │
julia> map(replace_nan ∘ last, eachcol(df))
1×2 DataFrames.DataFrame
│ Row │ a │ b │
├─────┼─────┼─────┤
│ 1 │ 3.0 │ 0.0 │
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> Pkg.installed("DataFrames")
v"0.11.5"
```

I missed the fact that `DataFrames`

defined a `map`

for `DFColumnIterator`

. So the version

```
using DataFrames
df = DataFrame(a = [NaN, 2.0, 3.0], b = [4.0, 5.0, NaN])
replace_nan(v::AbstractVector) = map(x -> isnan(x) ? zero(x) : x, v)
replace_nan!(v::AbstractVector) = map!(x -> isnan(x) ? zero(x) : x, v, v)
map(replace_nan, eachcol(df))
map(replace_nan!, eachcol(df))
```

works as is. Sorry for the confusion.

FYI, this only works if every column contains Floats. Otherwise `isnan`

will throw an error.

EDIT: To be a bit more helpful, the general idea of the `map`

function is what I use, but I actually loop over each column and check that it is `Vector{<:AbstractFloat}`

first before I apply the `NaN`

to 0 (or `missing`

, in my case) conversion.

Not quite,

```
julia> methods(isnan)
# 6 methods for generic function "isnan":
isnan(x::BigFloat) in Base.MPFR at mpfr.jl:828
isnan(x::Float16) in Base at float.jl:522
isnan(x::AbstractFloat) in Base at float.jl:521
isnan(x::Real) in Base at float.jl:523
isnan(z::Complex) in Base at complex.jl:118
isnan(x::AbstractArray{T,N} where N) where T<:Number in Base at deprecated.jl:56
```

This is where a function like R’s `dplyr::mutate_if()`

will be awesome someday in Julia. I imagine that someday one of us will build that into Query.jl or DataFramesMeta.jl. Until then, the alternatives are not bad at all.

Sorry, a subtype of `Number`

I guess. Still, `String`

columns would cause an error here. I suppose the anonymous function could have an additional logical layer that checks for proper element type first, and that would make this a robust option.

I agree that a `mutate_if`

function is important, but with `DataFramesMeta`

you can also use the `@byrow!`

macro to get similar results.

Thhis function is what I use for this:

```
function checkForNotANumber(x::Any)
(!isa(x,Integer) && !isa(x,Real)) || isnan(x)
end
```

Note that `Integer <: Real`

, so checking for the latter is sufficient.

Thinking about this discussion,

```
# FIXME get letter of marque for type piracy
Base.map(f, df::AbstractDataFrame) = map(col -> map(f, col), eachcol(df))
```

would solve this problem 90% of the time (when I don’t want to do something different for columns). Eg

```
map(x -> x isa Real && isnan(x) ? zero(x) : x, df)
```

For `missing`

(replace `ismissing()`

with `isnan()`

for `NaN`

s) you can use also this trick, that makes all `missing`

in numeric columns equal to 0, and all `missing`

in strinng columns equal to `""`

:

```
[df[ismissing.(df[!,i]), i] .= 0 for i in names(df) if Base.nonmissingtype(eltype(df[!,i])) <: Number]
[df[ismissing.(df[!,i]), i] .= "" for i in names(df) if Base.nonmissingtype(eltype(df[!,i])) <: String]
```

In recent versions you can now do this:

```
mapcols(col -> replace!(col, NaN=>0), df) # In-place
```

or

```
ifelse.(isnan.(df), 0, df)
```

Yes, also it will work in place: `df .= ifelse.(isnan.(df), 0, df)`

.

Sorry for a late question related to this topic. In my case, it is not NaN but negative values. I have a time series with positive and negative entries. I need to transform the negative values into zero. I do not want to eliminate them; I need to keep them as observations, but with a value of zero. Help will be very much appreciated. Thanks.

This will do it

```
julia> df = DataFrame(a = randn(5), b = randn(5))
5×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼──────────────────────
1 │ 0.8805 0.667461
2 │ 0.17179 -0.618585
3 │ -0.667805 -0.32467
4 │ -0.517509 -0.321862
5 │ 1.64746 -0.344586
julia> mapcols(t -> ifelse.(t .< 0, 0, t), df)
5×2 DataFrame
Row │ a b
│ Real Real
─────┼───────────────────
1 │ 0.8805 0.667461
2 │ 0.17179 0
3 │ 0 0
4 │ 0 0
5 │ 1.64746 0
```

@pdeffebach Thank you very much. After two hours of stumbling by me, it took you just a minute to do it. Thanks.

or even shorter:

```
julia> df = DataFrame(a = randn(5), b = randn(5))
5×2 DataFrame
Row │ a b
│ Float64 Float64
─────┼───────────────────────
1 │ -0.386397 0.392352
2 │ -0.476617 -0.0270584
3 │ -0.218456 -0.224436
4 │ -1.17403 -0.520317
5 │ -1.70785 0.0390936
julia> ifelse.(df .< 0, 0, df)
5×2 DataFrame
Row │ a b
│ Int64 Float64
─────┼──────────────────
1 │ 0 0.392352
2 │ 0 0.0
3 │ 0 0.0
4 │ 0 0.0
5 │ 0 0.0390936
```