I made a little package, GenderInference.jl, that’ similar to Gender
in R and SexMachine
in python. Basically, you give a first name, and it guesses the gender:
julia> using GenderInference
julia> gender("Kevin")
:male
You can also ask look at a specific year or years, get raw counts and percentages.
julia> gendercount("stefan", 1976:2002)
(female = 42, male = 11217)
julia> percentfemale("jeff")
0.002147423092289253
I haven’t registered it yet, but I’d love thoughts on the API and especially my handling of when data isn’t available. I’ve only got data for 1880-2017, so what happens when you ask for dates outside that range? I’ve opted to use missing
in most cases
julia> gendercount("jane", 1880:1900)
(female = 7348, male = 0)
julia> gendercount("jane", 1879:1900)
(female = missing, male = missing)
If you ask for a name that doesn’t have any entries, but in a year that’s in range, gendercount()
gives zeros but tthe percent{female/male}()
functions give missing
. Does that make sense/seem intuitive?
julia> gendercount("Viral")
(female = 0, male = 63)
julia> gendercount("Viral", 1980)
(female = 0, male = 0)
julia> percentmale("Viral")
1.0
julia> percentmale("Viral", 1980)
missing
Since I’ve never studied data structures and algorithms, there’s probably a lot to be desired with respect to how I’m building, storing, and accessing the data, so any comments/suggestions there would also be most welcome. Thanks!