What is the current best practice for population formula for cov?


#1

Going back to some old code that uses cov(mat, corrected=false), and am welcomed with a massive block of warnings to use covm instead … looking at masters source it seems that corrected is no longer a keyword, but a positional argument, so I guess in julia 0.6 I would change this call to cov(mat, false)? Currently what do people do? I am tempted to just make my own personal function call that does the population formula instead. But I keep doing this these days, and worry I have far to large of a personal library of personal api functions.

What are people currently using for population formula’s now that keywords are gone from all the basic stats functions? Is there a package?

Thanks so much


#2

The deprecation warning is slightly misleading, as it appears to handle only the case when you pass a mean keyword argument. Just pass corrected as a positional argument. Though there’s another change which requires you to write cov(mat, 1, false) to replace the previous cov(mat, corrected=false). As it’s undocumented, I’ve filed an issue about it.


#3

Thanks so much for the clarification. I have tried to understand the arguments for removing the keywords, but it is a little lost on me (it is not clear to me how the type instability occurs in cases like this). Having to call cov(mat, false) or what it looks like will have to happen cov(mat[:, 1], mat[:, 2], false) feels a bit obscure to me. Are keywords largely being deprecated in Base? Should users code follow this? Thanks again.


#4

I don’t think type stability was the main issue with the old API. The main goal was consistency with other functions like mean and var (cf. this issue).

Keyword arguments are fine, but they currently impose a small performance penalty, which becomes significant for operations that are fast to compute. In some cases they can also introduce type instabilities, though you can often fix this with some care. I’d personally prefer the keyword argument version for corrected, but for now it would be too slow.

Anyway you won’t need to call cov(mat[:, 1], mat[:, 2], false), just cov(mat, 1, false).


#5

Okay that makes more sense. Thanks again for all the clarification, makes some of the decision much clearer. From the issue you posted on the lack of documentation above, isn’t it suggested that passing a matrix might be deprecated in the future?


#6

No, you’ll just have to specify the dimension along which variables are stored.