[ANN] StringCases.jl

I’d like to announce a new string case package, GitHub - acxz/StringCases.jl: Define, validate, and convert string cases.

A Julia package for string cases that allows users to

  • define their own string case
  • validate strings on string cases
  • convert between string cases

Comes with various common prebuilt string cases. See the src/commonstringcases.jl file.

The secret sauce to this package is an API that allows users to expressively specify any string case the user can thing of. See the example below for a how a string case can be specified.

While there are a myriad of string case libraries, few string case libraries exist where string cases can be user defined, customized or extended. This library is one of them. In addition, this library provides a larger feature set out of the box. This includes unicode support and splitting a string based on a delimiter or a matching pattern. Existing pattern support includes splitting a string based on case change, having acronym support, having number support, and combinations thereof.

A descriptive example is below:

using StringCases

# Let's define a pattern string case for camel case with acronyms, camelCaseACRO

# specify the casing of all the letters besides the first letter of each token
# and the first leter of the string
tokencase = lowercase;

# specify the casing of first letters of each token
tokencasefirst = uppercase;

# specify the casing of the first letter of the string
strcasefirst = lowercase;

# Options for all the cases include:
# lowercase
# uppercase
# titlecase
# StringCases.anycase

# Acronym (i.e. opposite casing of the tokencase) specifier to determine if
# acronyms exist in all/start/end of a token
# Options include:
# StringCases.acro_all_of_token: i.e. Speed30MPH
# StringCases.acro_start_of_token: i.e. Speed30Mph
# StringCases.acro_end_of_token: ie. Speed30mpH
# StringCases.acro_none_of_token: i.e. Speed30mph
acronymintoken = StringCases.acro_all_of_token;

# Split new tokens on numbers
splitonnumber = false;

camel_case_acro = PatternStringCase(
    "camelCaseACRO",
    tokencase,
    tokencasefirst,
    strcasefirst,
    acronymintoken,
    splitonnumber,
);

# Convert a string from our newly defined string case, camel_case_acro, to a
# common string case already defined in StringCases.jl (StringCases.PASCAL_CASE)
# For more string cases provided out of the box, take a look at the
# `src/commonstringcases.jl` file

StringCases.convert("stringCasesFTW!", camel_case_acro, StringCases.PASCAL_CASE)
# Output: "StringCasesFtw!"

# Now let's define a delimiter string case with hyphens, -, while preserving the
# original casing of the string via StringCases.anycase
# this is useful for keeping the acronym around
camel_train_any_case =
    DelimiterStringCase("camel-Train-ANY-Case", anycase, uppercase, lowercase, "-");

StringCases.convert("stringCasesFTW!", camel_case_acro, camel_train_any_case)
# Output: "string-Cases-FTW!"

# Let's say you have your own regex to pattern match tokens with
# Source: https://stackoverflow.com/a/70164741
wordpat = r"
^[a-z]+ |                  #match initial lower case part
[A-Z][a-z]+ |              #match Words Like This
\d*([A-Z](?=[A-Z]|$))+ |   #match ABBREV 30MW
\d+                        #match 1234 (numbers without units)
"x;

# You can use it like so:
my_pattern_case =
    PatternStringCase("myCamelCaseACRO123", lowercase, uppercase, lowercase, wordpat);

StringCases.convert("askBest30MWPrice", my_pattern_case, StringCases.SNAKE_CASE)
# Output: "ask_best_30mw_price"

# However, this pattern can already be specified by this library
# so you don't have to create your own regex
camel_case_acro_num = PatternStringCase(
    "camelCaseACRO123",
    lowercase,
    uppercase,
    lowercase,
    StringCases.acro_all_of_token,
    true,
);

StringCases.convert("askBest30MWPrice", camel_case_acro_num, StringCases.SNAKE_CASE)
# Output: "ask_best_30mw_price"

# Morever, it can handle unicode instead of just the latin letters like the
# custom regex
StringCases.convert("askBest30MWΠrice", my_pattern_case, StringCases.SNAKE_CASE)
# Output: "ask_best_30mw"

# Notice that the uppercase Greek letter Π denotes the start of a new token
# and is also lowercased as required by the snake case convention
StringCases.convert("askBest30MWΠrice", camel_case_acro_num, StringCases.SNAKE_CASE)
# Output: ask_best_30mw_πrice"

# Say you don't know the string case of an input string, but you can extract the
# tokens of the input string, e.g. using Base.split or Base.eachmatch
# In such a situation you can use the StringCases.join function to join a
# sequence of tokens that conforms to the specified string case

# Let's create a regex delimiter to split on one or more (+) characters in the
# Unicode punctuation category (\p{P})
dlm = r"\p{P}+";

tokens = split("string.Cases-_FTW!", dlm, keepempty = false)
# Output:
# 3-element Vector{SubString{String}}:
#  "string"
#  "Cases"
#  "FTW"

StringCases.join(tokens, StringCases.SNAKE_CASE)
# Output: "string_cases_ftw"

# If you don't want to convert a string, but just want to extract the tokens
# from a given string, based on an already defined StringCase, feel free to use
# the StringCases.split function. This can be useful if you don't want to
# write your own regex, such as camel case with acronyms and splitting on
# numbers in our example above.

StringCases.split("askBest30MWPrice", camel_case_acro_num)
# Output:
# 4-element Vector{SubString{String}}:
#  "ask"
#  "Best"
#  "30MW"
#  "Price"

# Validating a string to a StringCase is done with StringCases.isvalid
StringCases.isvalid("String.Cases-_FTW!", StringCases.KEBAB_CASE)
# Output: false

# After converting the string to kebab case
StringCases.convert("String.Cases-_FTW!", StringCases.TRAIN_CASE, StringCases.KEBAB_CASE)
# Output: string.cases-_ftw!"

# We now have a valid kebab case string
StringCases.isvalid("string.cases-_ftw!", StringCases.KEBAB_CASE)
# Output: true

Note: I understand that a StringCases.jl package is already registered, and I am open for renaming suggestions or incorporating this package’s feature set into the registered package. For more info see this older discourse post.
This package has a much larger feature set for string cases than the registered package and I believe this topic/resolution, whatever it may be, does require conversation.

9 Likes