[RFC] GenderInference.jl

Well, I think it’s sensible just to provide a model over the census data as it is, that’s the packages purpose. I didn’t look at that part of the code too closely but it looks like that’s what you’re doing already. :slight_smile: If the model is specific to the US census data, maybe it makes sense to have name the package in a way that makes that obvious.

Yeah, I thought about that. Really don’t like fraction though. Makes me think of eg 3/4. I thought about pfemale or propfemale for “proportion”, but not really a fan…

That’s the way it is now (it’s social security rather than census), but the goal is to expand beyond that. Maybe BirthNameGenders.jl?

But in that case it should return 100.0 instead of 1.0, wouldn’t you say? If I get percentsomething() returning 1.0, I will for sure think it’s 1%.


Ah, right- social security data, not census. Thanks for the correction!

Or perhaps NameGenderDemographics.jl, or something sort of like that? I dunno, it’s hard to come up with something that’s both generic and obvious. In any case, I’m glad that you see my point and are giving it some thought. :slight_smile:

1 Like

For sure! Probably the best thing to do is ask some gender queer people rather than trying to speculate about what they would find objectionable. I only know one trans person (to my knowledge). Maybe Twitter?

So after a little more thought, I’ve sort of found myself having settled on an stronger position than I had at first: basically, I think that it is categorically impossible for an ethically sound model like this to exist at all.

Here’s a brief article on this specific use case from an HCI researcher who I admire: https://ironholds.org/names-gender/ I find their arguments very convincing, so I think at this point I would urge those in the thread to to consider this carefully, and excuse myself from the discussion.

1 Like

Thanks for sharing! Definitely a useful perspective to keep in mind, though I don’t entirely agree.

One last thought:

From my point of view, it seems like that when you did get the opinion of a person like you described, and who additionally is actually a researcher whose area of expertise is precisely what we are discussing, you quickly dismissed them and decided that you know better.

(edited, since I think I was a little combative originally… sorry about that)

Is there no room between fully agreeing and quickly dismissing?

1 Like

Doesn’t need to be. I’d be happy to discuss over DM or email or another forum if you’d rather not continue the conversation here.

Not a quick dismissal, I take this perspective seriously. Also don’t think I know better, I’m just not entirely convinced that this is everywhere and always a bad idea, and would like to hear more perspectives. I’ve reached out to my trans friend (who also works with a lot of trans people).

Also, recall that we were discussing the name of the package, not whether the method itself was ethical.

I am pretty convinced that this is not a good way to measure diversity at a conference or in a company, which is that author’s main thrust it seems. I’m interested in using it to measure gender representation in publications (See here for an example), where it is not feasible to survey or otherwise determine the gender of hundreds of thousands of authors, so the work just wouldn’t be done without this or similar methods. We are also up front about many of the limitations of this method that are mentioned in the blog post.

I do think I will include a link to that post in the README, so users are at least made aware of that perspective.

It’s a fraught subject. Makes sense that one or both of us would get combative. No hard feelings :slightly_smiling_face:


Alright - I’m back to this. When I put register() inside __init__, (see here - init currently commented out), I get

ERROR: LoadError: LoadError: LoadError: KeyError: key "US Census - names" not found

And apparently I can’t do a global const declaration inside the __init__ function.

I’ve removed precompilation on that branch though - any chance you could test to see if the same infinite loop occurs?

Can I clarify something? I should still user a dictionary for the names to access these arrays right?

Yes, I think that makes most sense, to allow for constant time lookups by name.

I’m reviving this thread based on comments from here. This revival will be focused on the ethics of the approach rather than the technical aspects. For those uninterested in this aspect of the discussion, please feel free to unwatch/unsubscribe (and also feel free to DM me if you don’t know how to do that).

I don’t think it comes down to different readings. Rather, I think the author is responding to situations (like conferences) where a survey would be just as easy, yield better/more accurate data, and not have all of the problems this approach has. I agree that the author would likely extend the argument to other use-cases, but doesn’t really address them, and as a consequence, I find it less persuasive.

For example, two of the three primary objections don’t really seem to apply for the use-case I’m primarily interested in, or at least the case wasn’t made. Reason (1) that it’s “inaccurate” isn’t applicable because (a) no measument in science is perfect, and one can assess and account for error (b) the author erroneously talks about the databases as coming only from the top 100 names, but most that I’ve seen come from things like socal security records than include all names (I think as long as they show up at least 2 times in a given year) © I can show how it compares (favorably) to other more laborious methods such as manually looking at social media profiles and looking at self-stated gender (d) the biases w/r/t things like Asian names being under represented or impossible to infer from especially after romanization, are real and should be acknowledged, but even if we expect gender trends in science publishing to be radically different between those that are represented in the database and those that aren’t, pointing out disparities only in the communities that are represented still seems worthwhile.

The third objection that it’s usually unnecessary only addresses the conference situation, and no solution for my use case is offered. Yes, surveys of academics are possible, but far more costly and time consuming, and suffer from their own problems of bias. I’m any case, I would be unable to do this due to time and budget constraints. So one could argue that the work isn’t worth doing, given the other objections. Or one could argue that there’s a better way to do it, given time and budget constraints, but these arguments weren’t made.

The final objection, that it’s morally horrifying, I just find unconvincing. Quoting from the piece here:

The voids in these datasets don’t cover everyone evenly. Rather, they often fall straight down lines of race, culture and ethnicity.

Totally agree.

Accordingly, names that are largely unique to non-white groups are far more likely to be excluded from a top-N dataset than common names used by white people, for the simple reason that there are fewer non-white people .

This has a glimmer of truth but not for the reasons started. As I said above, most of the software I’ve seen (and this package) use datasets that are far more inclusive. There are lots of non-white babies born in the US, and so lots of non-white makes included in the datasets. Add to that, there are plenty of non-white babies named David or Sara.

That said, there are clearly biases. I’ve already mentioned a general problem of Asian names, especially when given romanized spelling, but there’s also the problem of non-ASCII characters being excluded or paved over in a way that obscures/changes the implied gender, plus the fact that the entire continents if Africa and South America tend not to publish such data sets.

All stipulated.

Congratulations: your methodology is racially biased.

So, this is true, but that’s different than saying it’s racist.

The result is, invariably, that you end up with a model that underrepresents people of colour, be they from European/North American contexts or elsewhere. Both are vital, non-excludable populations to consider in even the most half-hearted inclusion initiative.

I agree with all of this, but all models are wrong. Some are useful. This author seems to be arguing that, unless you can get a census-like count, there’s no value in assessing the gender make up of anything, ever. From earlier in the post:

there’s considerable variation in the data: ambiguous names that you can at best probabilistically tie to a binary gender. “Sam” could be of any gender or none.

Again, stipulated. This point is written as if it’s some kind of scandal. In some situations, a probabilistic model can provide valuable, if imperfect insight. At least, I think so. I’m open to being persuaded.

The issue of erasure of trans/non-binary people is the part I’m most conflicted about, though I find this person’s arguments really unconvincing, eg

Frankly, claiming that birth name maps immutably to gender in the first place is the kind of essentialist TERFy nonsense that has no place in inclusion efforts.

This feels like fighting a straw man - I would certainly never claim such a thing. I won’t go through them all, but many other points strike me to way - as if my algorithm would say that Sam is 80% likely to be male, so whenever I met someone named Sam I’ll treat them like a stereotype of a man and refuse any other information.

There’s no argument that addresses what to do if I have a dataset of 100 Sams, 100 Sallys, 100 Roberts and 100 Yishans. Should I say that, because the Sams and the Yishans are ambiguous, and that some of the Roberts and Sallys might be trans or non-binary people that I have absolutely no information?

But even getting past all of that, it could be possible that simply trying to study the inclusion of women, without addressing trans people, is too exclusionary. Does this mean that we can’t talk about the gender pay gap without taking about figuring out whether there’s a trans pay gap? Can we talk about how women take on more childcare responsibilities in general, and how this is being exacerbated by COVID with children being out of school, if we can’t also assess the impacts of childcare responsibilities on non-binary people?

I 100% believe that trans and non-binary people have very different experiences, are generally more marginalized, and deserve to be acknowledged and taken seriously. And the same for POC. If I were on a committee that oversees grant proposals, I would absolutely look to fund efforts trying to assess their contributions in science publishing, which could obviously not be done with this method. But none of that send to indicate that trying to address the role of women (even if it mostly only applies to cis women) is bad.

I don’t mind at all :slightly_smiling_face:. You’re right - this was not the best way to refer to this person. In the heat of the moment, I was responding to someone that I felt was accusing me if the equivalent of “I have a friend who’s black, so I can’t be racist.” I don’t think that having a non-binary friend means that I can’t have bias, but I sought out that person’s opinion because they are non-binary and also spend their life and career in trans-activism. My response was intemperate, to be sure, but I don’t think the author of that piece necessarily had more credibility than my friend.

This is an excellent paper! I think I will need to re-read a couple of times (currently in vacation, so didn’t do a deep dive), but I think that there’s a lot of stuff that’s relevant to my research and to this package.

I will note though that these authors took a similar approach for one of their papers, and are not arguing against the practice categorically. They offer a number of suggestions to make such research more aware of and cognizant of these biases to be sure - and I think this is super valuable.


Congratulations: your methodology is racially biased.

So, this is true, but that’s different than saying it’s racist…

A distinction without a difference. I guess all I can say at this point is that while I hope these exchanges have been valuable, I also hope that they haven’t inadvertently contributed to a little bit of precedent for the Julia forums being a place where the upsides of self-consciously trans-exclusionary and racially biased research methodologies can be discussed openly.

I hope that the Julia forums will remain a place where all kinds of research methodologies can be discussed openly, even if they are imperfect from a particular point of view.

Working with data always involves trade-offs and assumptions, some of which are prone to biases. Sometimes it is difficult to do any better though (until someone comes up with a better methodology, or data), in which case it is important to be aware of these biases.

Stifling discussion about them would have the opposite effect.


It’s possible we have different definitions of these terms, but it seems to me like there is an important distinction. Bias is a property of data, racism and other forms of discrimination (it seems to me) may result from how data is used, and there may be datasets that are so biased that discriminatory outcomes are inevitable, but the terms are not synonymous. And note, I am perfectly comfortable with the definition of racism that includes not only overt acts and intentions, but also structural and systemic factors.

I have personally found it valuable, thanks for engaging! I hope this isn’t the end of the dialogue. I think making progress requires this kind of open dialogue - I co-sign @Tamas_Papp’s point 100%, and also want to add to

That it could very well be the case that the limitations of a dataset or methodology are such that useful information cannot be gleaned from it, or that the harms of using the dataset outweigh the benefits. But it doesn’t sound like that is what you’re saying - it sounds like you’re saying that any science that isn’t 100% representative is immoral.


it sounds like you’re saying that any science that isn’t 100% representative is immoral.

I am certainly not saying that. What I am saying is that it is not ethical to choose a methodology that explicitly encodes a view of gender as a binary label that is inferrable from surface features like birth names, and that it is not ethical to furthermore train such a model of gender on racially biased data, knowing that it will reproduce those biases. To my mind, abstracting the specifics of this into generalities about all models or all datasets or the scientific process writ large would not be an especially productive way to think about the social context of choosing such a methodology.

1 Like

This isn’t a fully general problem, but it’s not limited solely to gender binarization, either. Any attempt to do quantitative social science based on an external data source involving protected classes (race, gender, religious beliefs, …) involves projecting the mess of real-world, individual identities into categorical constructions. Those categories will, of course, reproduce the biases in the society from which they’re drawn. It’s unethical to ignore those biases entirely, but you can still draw useful inferences (in the sense of “all models are wrong, but some are useful”) from biased data, especially if you’re able to audit the model and quantify (and acknowledge) the uncertainty caused by imperfect assumptions.