[ANN] SuffixAutomata.jl for fast, online updating and querying of generic collections

myersm0 · September 24, 2025, 1:32pm

I’d like to announce a small, dependency-less new package, SuffixAutomata.jl. You can read about the algorithm here if you’re not familiar with it.

The big draws of this algorithm are:

allows fast online updates
very fast O(m) checking for the presence of a pattern/substring in the automaton, where m is the length of the search pattern/substring. So even if you have millions or billions of characters or tokens in your text (or whatever sort of collection your automaton represents), checking for the existence of a small pattern or substring is almost instantaneous.

This algorithm is traditionally used for strings, but it doesn’t need to be. A draw of my implementation is that a SuffixAutomaton can operate on a Vector{T} where T is any iterable type. For example, I’m going to be using it for pattern detection in streaming audio data (which is discretized using another package that I have under development, SymbolicDiscretizers.jl).

In coming days and weeks I will address some shortcomings:

make it more suitable for working with big out-of-core and streaming data
the current findall implementation is pretty fast, but not super fast O(m) like occursin is; I’ll see what I can do to improve that
allow optimized representation for small, fixed vocabularies

A quick toy demo (lcs stands for longest common substring):

a = SuffixAutomaton("ababc")
typeof(a)                    # SuffixAutomaton{Char}
eltype(a)                    # Char

occursin("ab", a)            # true
findall("ab", a)             # [1:2, 3:4]
lcs("abcdef", a)             # ("abc", 1)

append!(a, "def")
lcs("abcdef", a)             # ("abcdef", 1)

Topic		Replies	Views
Suffix Tree/Suffix Arrays General Usage	0	49	September 13, 2024
[ANN] SymbolicApproximators.jl: methods for symbolic time series discretization Package Announcements discretization , time-series	4	264	October 3, 2025
Subvector pattern matching General Usage	12	601	December 30, 2022
Finding patterns in vectors General Usage array , pattern-matching , vector	9	1057	June 20, 2022
Text Mining: Detect Strings: Word Lookup in a Large Corpus of Phrases Using a Large Dictionary Performance question	27	2324	December 15, 2021

[ANN] SuffixAutomata.jl for fast, online updating and querying of generic collections

Related topics