It was kind of a joke and kind of not. On the one hand, Julia isn’t really a shell, so it’s a joke. On the other hand, what is a shell but a way to call various pieces of code? The point being that calling executables from a shell and calling functions from a REPL are not actually radically different ways to operate so it’s not so weird for people who work in a REPL-oriented language to favor the latter over the former.
That said, I think it’s perfectly reasonable to benchmark CLI execution time. It might make sense to measure both function execution and command execution time. After all, there’s no real reason that this particular functionality is only useful in command line form — you could just as easily make a web app that does this and then you’d want a function that computes this, not a command-line tool. Of course, if you have a function then wrapping it in a command line tool is pretty trivial yet you can’t have the tool without the function, so in some sense the function version is the fundamental one.