Julia subsumes ICON: well done

I developed a likely poor-supported belief the developers where re-evaluating Julia’s string functionality for Julia 2.0 from some off-hand comment that @StefanKarpinski made in some post.

That’s an interest area of mine, as in my (ahem) 42 years of programming, I always end up doing a lot of string parsing. Usually to clean up, reformat, or extract information from data. Even using assembly language, I was processing strings. I’m not a professional programmer (engineer and statistician).

Historically, SNOBOL was considered a language with good tools for string handling. This was later improved upon by the late Ralph Griswold with the Icon programming language. Back in the day, I read the book and used it a little bit.

I did some research to see if Icon had anything that I would want to add to Julia 2.0. Icon’s major feature in this area is “string scanning”. The journal article String Scanning in the Icon programming language provides a concise description of the functionality, or you can always read the docs.

However, as far as I can tell, Julia already has that functionality. Namely, use split() as an iterator, then in the loop, use the Iteration utilities. Plus, in Julia, it works with Unicode strings.

Well done, developers!

6 Likes

Also, string performance is on track to improve a bunch for 1.8. allocate small strings in 8-byte-aligned pools by JeffBezanson · Pull Request #41247 · JuliaLang/julia · GitHub is already merged and reduces the memory of small strings by 8 bytes. Further improvements will likely include some combination of inlined small strings, pooled storage for strings, and/or removing string type tags in some circumstances.

7 Likes

Very early on when we gave talks about Julia, we sometimes did a bit about how text processing used to be a “programming niche” like numerical computing. There were specialized text processing languages like SNOBOL, Icon, Awk and Perl1 through Perl4. Then at some point people figured out how to do text processing well in general purpose languages and you had languages like Perl5, Python and Ruby which are definitely general purpose but also good at text processing. And that killed off the niche: today it would be strange to believe that text processing and general purpose programming are somehow at odds with each other and that if a language is good at text processing it cannot possibly be fit for writing a web server or implementing a GUI program. However, that was the apparent state of affairs in the 70s and 80s.

Replace “text processing” with “numerical programming” and I think you’re where we were up to the 2000s: many people believed that numerical/technical computing was a special niche and that languages that are good for it — like Matlab, R, or Mathematica — could not possibly be good for general purpose tasks as well. And indeed those languages are rather awkward to write web servers in. Of course, Python makes an interesting case here: it’s clearly general purpose and people are using it for numerical computing, but it does seem to sacrifice notational and behavioral comfort for the numerical stuff in order to be more general. Some people still believe that tradeoff is inherent. And in Python numbers are still very much special cased and built into the language, not something you can make more of yourself — you cannot create a new, efficient integer type, for example.

In some ways, Julia is an experiment to prove that numerical computing is not a niche: you can design a completely general purpose programming language in which types like integers and floats are not treated specially (aside from having literal syntax) — and an end-user can implement types that are just as efficient, which integrate just as smoothly as the “built in” types in terms of conversion, promotion, indexing, etc. It’s the answer to the question: What if we made numerical types like numbers and arrays not special at all and instead designed a language so that those kinds of things (small, immutable, efficient; complex, highly polymorphic) can be implemented in the language itself.

Anyway, it’s very nice to hear that you feel that Julia does text processing as well as languages like Icon that were specifically designed for that purpose :heart:. I’d love to improve the performance of text processing even more and there are some ideas that have been kicked around for a long time about how to do that, but it hasn’t made it to the top of the compiler team’s priorities just yet. Some day!

25 Likes

Here’s another example. We do very large scale performance testing and need to do a lot of log postprocessing. This used to be done mainly in Perl. I took one of the longest running scripts and rewrote it in Julia. The Julia version runs 10 times faster than did the Perl version and is considerably more capable. Admittedly I also made algorithmic improvements but it certainly shows Julia is more than capable of handling jobs that used to be the domain of specialized text processing languages.

6 Likes