Coalesce function not working with data frames n x n with n>=2


#1

Hi,

y=dataframe 6x6

I may be wrong but I tried to use coalesce function on a dataframe 6x6 and I get the error: no method matching iterate.
coalesce.(y,0) ==> Error
But if I try coalesce.(y[i],0) for each i it works.
any help?
Thank you


#2

It might help if you could read PSA: make it easier to help you, and then post a MWE - it is currently unclear what you are trying to do.

From the REPL:

help?> coalesce
search: coalesce

  coalesce(x, y...)

  Return the first value in the arguments which is not equal to missing, if
  any. Otherwise return missing.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> coalesce(missing, 1)
  1

  julia> coalesce(1, missing)
  1

  julia> coalesce(nothing, 1)  # returns `nothing`

  julia> coalesce(missing, missing)
  missing

#3

Broadcasting on a dataframe doesn’t go through every cell, it goes though each row. Non of the rows are missing, so coalesce didn’t do anything. Also, it’s not mutating. Not sure if there’s a better way, but I often do

for n in names (df)
    df[n] = coalesce.(df[n], 0)
end

#4

Could you clarify what operation you are trying to perform? Are you looking to replace each missing in a DataFrame with 0?


#5

It’s been evoked several times to add an argument to disallowmissing! to do that. Should be easy to do.


#6

Sorry if I wasn’t enough clear.
I have a multi-dimension dataframe y (size is 6x6 see picture)


To replace missing value by 0 in this data frame, I tried: coalesce.(y, 0) but I got an error message. The only way to do this is : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I would have thought that coalesce.(y, 0) will replace any missing value in the DataFrame which is not working, it only works for DataFrame vector but not DataFrame nXn
by doing : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I get the 0 instead of missings


#7

Did you try what I suggested?

?

Your way works, but I think also allocates a vector. As I said, broadcasting over a dataframe applies the function to rows, not cells.


#8

Even simpler and more efficient (note this is an in-place operation, but I understand this is what you want) is:

replace!.(eachcol(df, false), Ref(missing=>0))

And eachcol(df1, false) will be simply replaced by eachcol(df1) in the near future when the deprecation period for the old way that eachcol worked finishes.


#9

Actually currently AFAIK you cannot broadcast an AbstractDataFrame now. I am not clear if it will be supported in the future, but if it will then it will be most probably row-wise as @kevbonham indicates.

For now you have eachcol and eachrow methods that you can call and they support broadcasting col-wise and row-wise respectively.


#10

Ah, this is a good solution. I keep forgetting about Ref() for stuff like this.


#11

Note that doing an in-place operation for this isn’t necessarily a good idea since it will keep the Union{T,Missing} element type even if there are no missing values, which can hurt performance and doesn’t indicate that missing values are not supposed to be present.