[RFC] GenderInference.jl

It’s possible we have different definitions of these terms, but it seems to me like there is an important distinction. Bias is a property of data, racism and other forms of discrimination (it seems to me) may result from how data is used, and there may be datasets that are so biased that discriminatory outcomes are inevitable, but the terms are not synonymous. And note, I am perfectly comfortable with the definition of racism that includes not only overt acts and intentions, but also structural and systemic factors.

I have personally found it valuable, thanks for engaging! I hope this isn’t the end of the dialogue. I think making progress requires this kind of open dialogue - I co-sign @Tamas_Papp’s point 100%, and also want to add to

That it could very well be the case that the limitations of a dataset or methodology are such that useful information cannot be gleaned from it, or that the harms of using the dataset outweigh the benefits. But it doesn’t sound like that is what you’re saying - it sounds like you’re saying that any science that isn’t 100% representative is immoral.

4 Likes

it sounds like you’re saying that any science that isn’t 100% representative is immoral.

I am certainly not saying that. What I am saying is that it is not ethical to choose a methodology that explicitly encodes a view of gender as a binary label that is inferrable from surface features like birth names, and that it is not ethical to furthermore train such a model of gender on racially biased data, knowing that it will reproduce those biases. To my mind, abstracting the specifics of this into generalities about all models or all datasets or the scientific process writ large would not be an especially productive way to think about the social context of choosing such a methodology.

1 Like

This isn’t a fully general problem, but it’s not limited solely to gender binarization, either. Any attempt to do quantitative social science based on an external data source involving protected classes (race, gender, religious beliefs, …) involves projecting the mess of real-world, individual identities into categorical constructions. Those categories will, of course, reproduce the biases in the society from which they’re drawn. It’s unethical to ignore those biases entirely, but you can still draw useful inferences (in the sense of “all models are wrong, but some are useful”) from biased data, especially if you’re able to audit the model and quantify (and acknowledge) the uncertainty caused by imperfect assumptions.

8 Likes