LegacyStrings.UTF8String

Core.String is a utf8 string written in C. What is the difference between it and LegacyStrings.UTF8String?

(Actually, many of the String functions are written in Julia.)

The LegacyStrings.UTF8String is obsolete code and probably shouldn’t be used these days. IIRC, the biggest difference is that LegacyStrings.UTF8String is much less forgiving of invalid-UTF8 data, whereas String can contain any possible byte sequence and tries to handle as many operations as possible on the data (which required a change to Char cleverly devised by @StefanKarpinski).

In early Julia, we had separate UTF8String and ASCIIString types, and when the strings were overhauled (Julia#24999) we shoveled all of the old code into the LegacyStrings package so that people could continue using code based on the old types.

2 Likes