Programming paradigm for handles

sambitdash · October 26, 2017, 5:56pm

Many a times you provide opaque structures to objects which you do not expect the library user to understand nor modify. These are typically object oriented proxy pattern or similar structures. For example, in Windows all the kernel objects are wrapped under a handle which the API consumer does not bother much about. They just call the API with the handle.

When I was developing the package for PDF reader library, I realized many of my reference structures can be handles. It’s a reader so I am technically not modifying the underlying files. Maintaining the Julia convention of using value_modify!() pattern of APIs do not really make sense in the public APIs. You are not modifying the parameters. But when you pass an internal interface or pointer technically you are modifying the parameter.

Has there been any attempt in Julia earlier to create handles to internal objects? A very simple mechanism will be to use a dictionary of internal objects and use a key for handle reference. Is there macro or such conceptual paradigm already developed or debated on?

PDFIO Library Link: API Structure and Design · PDFIO

regards,

Sambit

ssfrr · October 26, 2017, 6:53pm

I’m not sure what the downside is to just using a normal Julia object. Can you give a more concrete example?

One issue with using raw Integers as handles is that you can’t use dispatch, so you’d need to make sure you’re not using any of the same methods that mean anything for normal Ints.

Of course you could have a custom type that just wraps an Int, which is used as an internal lookup or a handle for some C library you’re wrapping. In that case I think whether you bundle the data directly or only include the handle would be orthogonal to how you define your methods, in the sense that for:

x = MyObj()
mutating_operation!(x)
y = nonmutating_operation(x)

The user shouldn’t care how the MyObj is represented internally.

sambitdash · October 26, 2017, 7:14pm

@ssfrr Here is a concrete example:

doc = pdDocOpen("filename.pdf")
info = pdDocGetInfo(doc)
page = pdDocGetPage(doc, 1)
pdPageExtractText(STDOUT, page)

Now let’s look at the line 1: doc is a handle to the file.

Now in line 2: Info data is requested from a doc object. But, if you think purely from a paradigm of Julia Programming pdDocGetInfo should have been pdDocGetInfo! as it would read internal data structures from the PDF files change the state of the doc mutable struct. But from an end user standpoint you are merely querying the object. So operation is truly not mutable.

Now the same for line 3 as well. As reading a page will load some internal state into the doc object. And page object may be retained in the doc as a loaded page for enhancing page loading efficiency. But for an end-user all these are opaque object operations.

Using Int as a handle for all object types can affect the dispatch paradigm. But a Proxy pattern would keep the object hierarchy intact but only make the object opaque to the end-user. Such, proxy objects may be put inside a Dict to refer to the actual implementation object. I am not expecting Julia type system to support this exactly as it’s not an object oriented programming language per se. But wondering if some such encapsulations have been thought out for some cases where data encapsulation is natural to the problem definition.

ssfrr · October 26, 2017, 8:15pm

I think if I were designing this API I’d write it as:

doc = PDFDoc("filename.pdf")
# or you could plug into FileIO
doc = load("filename.pdf")

info = getinfo(doc)
page = getpage(doc, 1) # or pages(doc)[1], which enables `for page in pages(doc`
text = gettext(page)
println(text)

# if there's a strong use-case for streaming the text directly to an IO stream:
gettext(STDOUT, page)

There’s a general rule here that if you find yourself prefixing all your functions with the same string, you’re probably not using dispatch as effectively as you could be.

I’m on the fence about whether info, page/pages and text are better than the getX variants above. Probably info is the one that is trickiest as it conflicts with Base.info which means something very different.

As far as whether an operation is mutating, I’d encourage you to think about it from the user’s perspective, if they call getinfo twice, does it return the same thing both times? If so than I’d argue it’s not really mutating from the user’s perspective so doesn’t need a !. There are even some methods like read that mutate their argument in an observable way, but in that case read is the variant that returns the data and read! reads it into a given array (mutating it).

Ultimately the ! is a convention that should be used to clarify, and requires a little judgement on where it’s most appropriate.

As far as data hiding, generally in Julia code folks don’t work to hard to prevent users from accessing internal state, and if it’s not specifically documented to be part of the public API, then it’s the user’s problem if some future change breaks their code. This is an issue that there’s been a tremendous amount of discussion about on the forums and github issues, so you can find a lot of information there on the various pros and cons.

ssfrr · October 26, 2017, 8:17pm

Oh, perhaps rather than info(doc) you could use metadata(doc), to avoid the Base.info conflict.

sambitdash · October 26, 2017, 8:56pm

@ssfrr Thanks a lot for your thoughtful ideas on PDF APIs. I guess if PDFIO is to be made part of FileIO may be some of the interfaces can be changed to more Julia like interfaces. PDF file design has some benefits of using the layers of API structures traditionally used like the Cos, PD and Common layers. Keeping it that way makes understanding the overall file format easily from an API standpoint when you refer to the PDF specification or look at similar PDF libraries elsewhere. Similarly, gettext here in discussion is only from page content. Text can be on comment in PDF page or an embedded JavaScript or file attachment and many other places like metadata etc.

Your comment on the ! convention on read vs. read! is very relevant. I guess as long as it’s obviously not mutating one should not probably have to be explicit about changing of internal object states by ending the functions with a !.

regards,

Sambit

sambitdash · October 26, 2017, 9:06pm

Unfortunately, in the PDF specification document metadata has a special meaning over DocInfo. Both may have overlap of information but have different objective. One is in PDF internal data structure of a dictionary and the other is in XML. Being a standard file format over the years, we may not have the flexibility of choosing terminology of our choice.