I’m glad to publish the announcement of Impostor.jl, a highly versatile synthetic tabular data generator package written in Julia.
Designed and built upon the Julia’s multiple-dispatch paradigm with simplicity in mind, Impostor exports several generator and utility functions through its API providing the user with a wide variety of options to choose from when generating synthetic data. Another key feature is its ability to make sense of different relations between columns while generating data via templates. Check out the documentation for more detailed information on this topic as well as concepts, conventions and how data generation is handled under the hood.
A couple of usage examples are presented below:
using Impostor
using DataFrames
credit_card_number(; formatted = true)
# "4767-6731-1326-5309"
surname(4; locale = ["pt_BR"])
# 4-element Vector{String}:
# "Feranndes"
# "Pereira"
# "Camargo"
# "Pereira"
firstname(["M"], 4)
# 4-element Vector{String}:
# "Charles"
# "Zacharias"
# "Paul"
# "Charles"
city(["BRA", "USA"], 4; level=:country_code)
# 4-element Vector{String}:
# "Curitiba"
# "Los Angeles"
# "São Paulo"
# "Rio de Janeiro"
address(["BRA", "USA", "BRA", "USA"]; level = :country_code)
# 4-element Vector{String}:
# "Avenida Paulo Lombardi 1834, Ba" ⋯ 25 bytes ⋯ "84-514, Porto Alegre-RS, Brasil"
# "Abgail Smith Alley, Los Angeles" ⋯ 42 bytes ⋯ "ornia, United States of America"
# "Avenida Tomas Lins 4324, (Apto " ⋯ 23 bytes ⋯ "orocaba - 89457-346, SP, Brasil"
# "South-side Street 1st Floor, Li" ⋯ 52 bytes ⋯ "as-AR, United States of America"
my_custom_template = ImpostorTemplate([:firstname, :surname, :country_code, :state, :city]);
my_custom_template(4, DataFrame; locale = ["pt_BR", "en_US"])
# 4×5 DataFrame
# Row │ firstname surname country_code state city
# │ String String String3 String15 String15
# ─────┼───────────────────────────────────────────────────────────────────
# 1 │ Mary Collins BRA Rio de Janeiro Rio de Janeiro
# 2 │ Kate Cornell USA Illinois Chicago
# 3 │ Carl Fraser BRA Paraná Curitiba
# 4 │ Milly da Silva USA California Los Angeles
template_string = "I know firstname surname, this person is a(n) occupation";
render_template(template_string)
# "I know Charles Jameson, this person is a(n) Mathematician"
println("My new car plate is $(render_alphanumeric("^^^-####"))")
# My new car plate is TXP-9236
Currently Impostor.jl is in a MVP state, so to speak; while many features are planned to be implemented soon, I’d like to get the feedback from the community first in order to understand where efforts should be focused. Should you have any questions, suggestions or any general feedback, feel free to open an issue in the repository or respond to this post with your thoughts.
Thanks!
Enzo