What steps should the Julia community take to bring Julia to the next level of popularity?

I mostly do probabilistic record linkage or PRL (also known as entity resolution, fuzzy merging, merging messy datasets, etc) at work, using either the R package fastLink or the Python package splink.

I would like to see a Julia package for PRL because it likely could be much faster. There is no Julia package available for this that I am aware of. Some development in 2020 was discussed at https://discourse.julialang.org/t/entity-resolution-duplicate-data-in-julia/33860. Then, I mentioned the two packages SpineBasedRecordLinkage.jl, which only does deterministic record linkage, and BayesianRecordLinkage.jl which unfortunately no longer is being developed because the developer graduated and moved on to working on other things.

I am not aware of any progress in Julia but there is plenty of development in R (fastLink) and Python (splink). PRL is a big use case, which is relatively easy in R and Python but seemingly difficult in Julia (and in popular commercial stat software such as Stata and SAS). A PRL package in Julia could be a “killer” package for Julia.

2 Likes