Coalesce function not working with data frames n x n with n>=2

Erisme · February 10, 2019, 3:49pm

Hi,

y=dataframe 6x6

I may be wrong but I tried to use coalesce function on a dataframe 6x6 and I get the error: no method matching iterate.
coalesce.(y,0) ==> Error
But if I try coalesce.(y[i],0) for each i it works.
any help?
Thank you

nilshg · February 10, 2019, 4:29pm

It might help if you could read Please read: make it easier to help you, and then post a MWE - it is currently unclear what you are trying to do.

From the REPL:

help?> coalesce
search: coalesce

  coalesce(x, y...)

  Return the first value in the arguments which is not equal to missing, if
  any. Otherwise return missing.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> coalesce(missing, 1)
  1

  julia> coalesce(1, missing)
  1

  julia> coalesce(nothing, 1)  # returns `nothing`

  julia> coalesce(missing, missing)
  missing

kevbonham · February 10, 2019, 5:39pm

Broadcasting on a dataframe doesn’t go through every cell, it goes though each row. Non of the rows are missing, so coalesce didn’t do anything. Also, it’s not mutating. Not sure if there’s a better way, but I often do

for n in names (df)
    df[n] = coalesce.(df[n], 0)
end

pdeffebach · February 10, 2019, 5:48pm

Could you clarify what operation you are trying to perform? Are you looking to replace each missing in a DataFrame with 0?

nalimilan · February 10, 2019, 9:03pm

It’s been evoked several times to add an argument to disallowmissing! to do that. Should be easy to do.

Erisme · February 11, 2019, 3:18am

Sorry if I wasn’t enough clear.
I have a multi-dimension dataframe y (size is 6x6 see picture)

To replace missing value by 0 in this data frame, I tried: coalesce.(y, 0) but I got an error message. The only way to do this is : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I would have thought that coalesce.(y, 0) will replace any missing value in the DataFrame which is not working, it only works for DataFrame vector but not DataFrame nXn
by doing : [y[i] = coalesce.(y[i], 0) for i in 1:size(y,2)]
I get the 0 instead of missings

kevbonham · February 11, 2019, 6:26pm

Did you try what I suggested?

?

Your way works, but I think also allocates a vector. As I said, broadcasting over a dataframe applies the function to rows, not cells.

bkamins · February 11, 2019, 9:07pm

Even simpler and more efficient (note this is an in-place operation, but I understand this is what you want) is:

replace!.(eachcol(df, false), Ref(missing=>0))

And eachcol(df1, false) will be simply replaced by eachcol(df1) in the near future when the deprecation period for the old way that eachcol worked finishes.

bkamins · February 11, 2019, 9:21pm

Actually currently AFAIK you cannot broadcast an AbstractDataFrame now. I am not clear if it will be supported in the future, but if it will then it will be most probably row-wise as @kevbonham indicates.

For now you have eachcol and eachrow methods that you can call and they support broadcasting col-wise and row-wise respectively.

kevbonham · February 12, 2019, 8:18pm

Ah, this is a good solution. I keep forgetting about Ref() for stuff like this.

nalimilan · February 12, 2019, 8:21pm

Note that doing an in-place operation for this isn’t necessarily a good idea since it will keep the Union{T,Missing} element type even if there are no missing values, which can hurt performance and doesn’t indicate that missing values are not supposed to be present.

Topic		Replies	Views
Replacing missing and NaN values in dataframe New to Julia question , dataframes , missing-values	6	4092	March 29, 2022
Coalescing DataFrame columns and record source Data	3	341	May 2, 2023
Replacing missing, really Data	8	10772	February 26, 2020
DataFrame and Missings.replace() Data	10	4254	November 12, 2020
Iterate over all numeric columns in DataFrames Data	21	4854	February 11, 2018

Coalesce function not working with data frames n x n with n>=2

Related topics