"Unget" char from stream

Hello there, I was wondering whether the Julia IO API provided a built-in facility for ungetting characters from an ::IO stream in a way analogous to C’s ungetc() function.

If not I am aware I can implement it by caching them in some stack and popping the stack instead of reading from the actual stream, but it would be good to know if such facilities are provided.

From one I can tell, in Julia one typically uses peek and/or seek and/or mark and reset for situations where one might otherwise use ungetc. Did you have a specific application in mind?

(I’ve changed the title to fix the typo UngerUnget.)

Agreed with @stevengj - usually, instead of directly reading a character (if desired), you’d peek first to avoid the sort of push-back associated with ungetc.

Thanks for the suggestion. peek() is indeed an acceptable solution in my case (I’m tokenizing some file) and will probably be what I’ll end up using. The motivations for using ungetc() as opposed to peek() in C, however, are not satisfied here. They are not necessary in my case, but may be in others, which warrants interest.

The first is that ungetc reduces the amount of system calls. This is because usual implementations don’t actually put the char back onto the stream, but rather in a side stack, and when stuff is read fr the stream, first the stack is popped until it’s empty, and then the system calls (read()) are resumed. If a file is large enough and contains a lot of peek situations, peek calls will lead to the same character being read again, whereas a char that was previously ungetc’d will not be read through a system call the next time. I am not entirely sure what kind of benchmarking this yields but I might try this out both in C and Julia to see if my argument holds water.

The second motivation is that ungetc is not simply an alternative to peek becausw the char that is pushed back needs not be the actual char that comes from the previous read operation. You can push back anything you like, which is not at all what I intend to do in my problem, but may be for others.

The default IO objects already buffer read data and read in batches, so there isn’t necessarily a read syscall just because you’ve written read in julia (you can check src/support/ios.[c/h] if you want to find out more about the default IOStream you get when opening a file. You can find this by doing @edit peek(open("file.txt"), UInt8)). Moreover, the underlying implementation also has an ios_ungetc defined. It isn’t exposed on the top level in general, most likely because that doesn’t translate well to anything other than single bytes - and even that isn’t guaranteed to actually work on all systems for more than one byte. E.g. on my system, only one pushback is guaranteed per the man page:

ungetc() pushes c back to stream, cast to unsigned char, where it is available for subsequent read operations. Pushed-back characters will be returned in reverse order; only one pushback is guaranteed.

2 Likes

May I suggest the excellent Parsers.jl package? I think if you’re doing tokenization/parsing, you’ll see great benefits.

That looks very enticing, I’ll check it out!

It would be easy enough to expose this (just change ios.h to export it with JL_DLLEXPORT and then write a thin wrapper), but it would only work for IOStream objects (what you get when you open files). For several other stream types Julia uses libuv, which doesn’t implement ungetc as far as I can tell, though we support a buffering layer on top of libuv that might be used to provide this functionality.

That’s true, but I’m unclear on the usefulness of that functionality.

See also the discussion around the implementation of mark/reset in Julia (way back in 2014) — the main motivation was that it was so much more general than peek and ungetc (and subsequently allowed peek to be implemented for multi-byte objects): https://github.com/JuliaLang/julia/pull/3656

1 Like