Announcement: Automa v1.0 beta preview

Automa 1.0 beta preview

I’m very pleased to announce the release preview of Automa.jl version 1.0.0-beta1. This is something I’ve put in several hundred hours in over the last two years.

Version 1 brings many exciting changes compared to version 0.8, which you can read more about below. However, the release is also thoroughly breaking, completely changing the interface of Automa. Therefore, if you’re a current or potential user of Automa.jl, I’m interested in getting feedback before version 1.0 is released and the API is locked in.

I’m especially interested in feedback from Kenta Sato (@bicycle1885), Kevin Bonhan (@kevbonham), Pavel Dimens, Ciaran O’Mara (@CiaranOMara) and other BioJulia stakeholders.

The PR is not yet merged to Automa master, so to test it out, run:
]add https://github.com/jakobnissen/Automa.jl#v1.0.0-beta1

You might want to begin by reading the revamped documentation

What is Automa?

Automa is a regex-to-Julia compiler. It is used to create lexers and parsers which are faster and more rigid than handmade ones.
It’s a backbone of the BioJulia ecosystem with 102 transitive dependents.

Why does it need such a breaking release?

While Automa is a great package, unfortunately it has been difficult to understand, learn, and use. Therefore, it has seen less usage and development than its potential. In particular, I’ve heard feedback from several potential users that they’ve wanted to use Automa, but find it too perplexing. Therefore, the main goal of v1 has been to overhaul the API to make the package easier to learn and to use.

Version 0.8 also contained a few compiler bugs which downstream users could easily accidentally depend on, and which therefore could not be fixed in a minor release.

Even though the change is disruptive, I believe users of Automa will appreciate the several improvements and bugfixes that a breaking change allows.

Major changes

The entire API have been completely overhauled, so it might not be informative to give a list of API changes. Instead, I recommend users read the new (quite thorough) documentation of the release.

API definition

  • Automa now has a distinct separation of API vs internals. All API is exported and documented, so users only have to type using Automa to get all relevant functions and macros. No more const re = Automa.RegExp and such.
  • Users now don’t have to access sub-modules, or the fields of objects, which are now considered internal.
  • A few details, which previously seemed like implementation details, but which users were forced to rely on, have now been explicitly documented

API changes

  • Every single user-faction function has been changed. Code using Automa is now less verbose and less error-prone. See the documentation for details and a comprehensive tutorial to Automa.

Misc changes

  • The goto-generator now uses SIMD by default on x86_64 machines for improved performance
  • Tokenizers are now much more efficient and easier to create. They also have more well-defined behaviour in the face of ambiguous tokenization.

Improved errors and debugging

  • Generated code now throws an informative error if the input data does not match the given regex. The error can be disabled if the user wants to handle the mismatch in another fashion. This greatly improves the default error messages for parsers/lexers created with Automa.
  • If the actions in the actions dict does not match with a machine’s actions, an informative error message is now thrown. In v0.8, any extra actions were silently skipped.
  • Added debug functionality execute_debug for easy debugging of machines.
  • NFAs now contain fewer nodes, making them easier to display when debugging.
  • If pseudomacros (like @escape) is used in a wrong context, an informative error message now occurs at compile time.
  • A regex that matches no data now throws an error when it is compiled. This can easily happen by accident.

Bugfixes

  • In v0.8, it was possible to compile regex with unresolvable sets of actions. This now throws an informative error at compile time, whereas in v0.8, it would result in miscompilations.
  • In v0.8, it was possible to add an action to the “final” byte of a looping regex like “a+”, depite such a regex not having a definite final byte. This now raises a compile time error.
  • In v0.8, some generated code did not respect the variables in your CodeGenContext which could result in miscompilations. This has now been fixed.
  • In v0.8, some generated code created variables not contained in the CodeGenContext, which could lead to miscompilation if user-injected code used variables of the same name.
23 Likes