Coalesce function not working with data frames n x n with n>=2


y=dataframe 6x6

I may be wrong but I tried to use coalesce function on a dataframe 6x6 and I get the error: no method matching iterate.
coalesce.(y,0) ==> Error
But if I try coalesce.(y[i],0) for each i it works.
any help?
Thank you

It might help if you could read PSA: make it easier to help you, and then post a MWE - it is currently unclear what you are trying to do.

From the REPL:

help?> coalesce
search: coalesce

  coalesce(x, y...)

  Return the first value in the arguments which is not equal to missing, if
  any. Otherwise return missing.


  julia> coalesce(missing, 1)

  julia> coalesce(1, missing)

  julia> coalesce(nothing, 1)  # returns `nothing`

  julia> coalesce(missing, missing)

Broadcasting on a dataframe doesn’t go through every cell, it goes though each row. Non of the rows are missing, so coalesce didn’t do anything. Also, it’s not mutating. Not sure if there’s a better way, but I often do

for n in names (df)
    df[n] = coalesce.(df[n], 0)

Could you clarify what operation you are trying to perform? Are you looking to replace each missing in a DataFrame with 0?

It’s been evoked several times to add an argument to disallowmissing! to do that. Should be easy to do.

Sorry if I wasn’t enough clear.
I have a multi-dimension dataframe y (size is 6x6 see picture)

To replace missing value by 0 in this data frame, I tried: coalesce.(y, 0) but I got an error message. The only way to do this is : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I would have thought that coalesce.(y, 0) will replace any missing value in the DataFrame which is not working, it only works for DataFrame vector but not DataFrame nXn
by doing : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I get the 0 instead of missings

Did you try what I suggested?


Your way works, but I think also allocates a vector. As I said, broadcasting over a dataframe applies the function to rows, not cells.


Even simpler and more efficient (note this is an in-place operation, but I understand this is what you want) is:

replace!.(eachcol(df, false), Ref(missing=>0))

And eachcol(df1, false) will be simply replaced by eachcol(df1) in the near future when the deprecation period for the old way that eachcol worked finishes.

1 Like

Actually currently AFAIK you cannot broadcast an AbstractDataFrame now. I am not clear if it will be supported in the future, but if it will then it will be most probably row-wise as @kevbonham indicates.

For now you have eachcol and eachrow methods that you can call and they support broadcasting col-wise and row-wise respectively.

1 Like

Ah, this is a good solution. I keep forgetting about Ref() for stuff like this.

Note that doing an in-place operation for this isn’t necessarily a good idea since it will keep the Union{T,Missing} element type even if there are no missing values, which can hurt performance and doesn’t indicate that missing values are not supposed to be present.

1 Like