Rationale & operations for built-in `HTML` type

cce · November 22, 2020, 4:05pm

I’m curious about the scope of the built-in HTML type. It’s nice that something like this exists, to indicate that a given string value is intended to be valid HTML. However, it doesn’t seem that general operations on this type have been implemented. Some standard operations give you an error, others give you quite unexpected results.

julia> html"A" * html"B"
ERROR: MethodError: no method matching *(::HTML{String}, ::HTML{String})
julia> join([html"A", html"B"])
"HTML{String}(\"A\")HTML{String}(\"B\")"

Is there an articulated design rationale for the HTML data type?

Tamas_Papp · November 23, 2020, 12:02pm

My understanding is that it is just a thin wrapper for stuff that is already HTML, eg for this show method.

If you want to construct HTML output from Julia, look at the various packages that do that (depending on what you want to do — perhaps provide specifics).

cce · November 23, 2020, 2:10pm

That’s not my concern. My concern is about the behavior of HTML objects once they are constructed. What operations are valid on them, etc. For example, the constructor doesn’t recognize even its own type. For strings, nesting of strings in the constructor has relatively intuitive behavior…

julia> string("X", " ", string("Y"))
"X Y"

I’m not sure why HTML behaves this way, but it’s not parallel (the source for the construct shows it’s not self-aware), as the constructor for the string type is.

julia> display("text/html", HTML("X", " ", HTML("Y")))
X HTML{String}("Y")

My guess is that this type was declared in a moment, but never really explored and developed. If we want interoperability among tools that construct HTML, then we should probably provide reasonable semantics for HTML objects. As Julia becomes more used outside of core number crunching for web application development (or even notebooks, like Pluto.jl), working smartly with hypertext data seems important.

Importantly, what is the stomach and path to providing better semantics for HTML data type? Or, given how undeveloped it is, should it even be part of the core system? I mean, HTML is active with no other action when I start Julia… it’s a built-in. Not using it when rendering to HTML seems anti-social, yet, as noted, even basic operations simply don’t work as you’d expect.

Tamas_Papp · November 23, 2020, 2:22pm

Pretty much show, I guess (as documented, BTW).

julia> methodswith(HTML)
[1] show(io::IO, ::MIME{Symbol("text/html")}, h::HTML) in Base.Docs at docs/utils.jl:34

I think that those should be in packages, not Base. Also, I am not sure what kind of interoperability you are after.

Sorry, I could not parse that (English is not my native language).

Maybe just use it for emitting a final HTML that is rendered as is — for constructing HTML, use one of the existing packages or just build up from strings.

cce · November 23, 2020, 2:32pm

@Tamas_Papp I’m not suggesting that tools the construct HTML should be in base. However, if one wishes tools that construct HTML to be composable (e.g. a plurality of libraries and not a singular framework), they need a unit of composition.

In my opinion, the HTML type should support basic operations that are supported by string (construction, concatenation, and join) or HTML shouldn’t be in Base. What is the path way to providing sane support for basic operations? This could be done in an package, and then, after a while, Julia could incorporate those improvements? Even so, the constructor is a problem, how could a library fix that?

Tamas_Papp · November 23, 2020, 2:36pm

I disagree — again, it is just a wrapper type. You construct HTML code any way you prefer, and then wrap it in HTML. From that point on, show knows that to do with it — that’s the sole purpose of this type.

IMO constructing HTML as strings is not ideal for nontrivial output, but if you insist, you can use strings and then just wrap the end result.

Tools that actually construct HTML usually use their own structural representation — I am not sure how you want them to be composable. That’s an interesting question (that could benefit from an example), but possibly orthogonal to the one in this topic.

cce · November 23, 2020, 2:37pm

Then the HTML constructor should only accept a single String and not have intelligence built in.

So, when mixing libraries from various sources, we may wish each component to be able to produce an HTML representation, and using HTML for this purpose seems reasonable (since it’s obvious and it’s a built-in).

The HTML constructor takes any number of objects for its argument, but it doesn’t ask those objects to render themselves for text/html (this would be smart), instead, it renders them without a MIME type. As a result, it doesn’t even handle itself, let alone any object that supports HTML rendering. The constructor should have only accepted String, or, objects that can be rendered as text/html… taking the repr of an object and treating it as its hypertext representation is misconceived.

Tamas_Papp · November 23, 2020, 2:51pm

If you read the docs, you will see that the HTML(xs...) constructor is just for convenience. Similar examples exist in Base.

I am not sure I see the motivation for that — a HTML value should be something that is the end result (to be shown), not further input to something else.

Possibly (then it should be AbstractString though, not only String). Personally, I find it innocuous — callers should be responsible for providing valid HTML to HTML. User-facing methods to show will just error anyway if HTML output is not defined for show for a particular type, so this is not really harmful.

cce · November 23, 2020, 3:31pm

Thank you @Tamas_Papp. I’ve made a ticket, #38540, for the suggestion to limit the constructor to an AbstractString. I’d prefer a higher-level discussion of what it means to be an HTML object, rather than having it being defined by its implementation.

The “final” output of one program is often the input for another. Having ways to compose libraries that produce hypertext fragments without having to know about each other seems important. It also seems that the HTML datatype should answer this casting call.

Tamas_Papp · November 23, 2020, 3:38pm

Not really for show, though.

In the abstract, yes. But (again) a concrete example would really help this discussion.

cce · November 23, 2020, 3:59pm

Ok. So, I’m using DataKnots.jl to build cohort and other clinical queries within an application that emulates OHDI ATLAS only written in Julia, and not SQL+Java+Javascript+R. This application will probably build on top of Pluto.jl. Critically, we’re trying to do this as a set of bottom-up libraries and conventions, rather than as a top-down framework. As a result, I need a way for various approaches to compose hypertext fragments to work together. Some of of these presentation fragments may be generated by Hyperscript.jl or HAML.jl or really any other tooling. Moreover, those fragments would be combined as leaf nodes within a query, so that the query itself is composing the page. As a result, I need a unit of composition that these libraries could support without having to know about my applications. That is, if I have a query with results rendered using HAML, I need them to also mix with fragments produced by Hyperscript. In this regard, HTML data type seems a lovely “unit of composition”. The problem with using the HTML type as a unit of composition is that page composition shouldn’t care what hypertext generation library was used, so long as the inputs are all HTML fragments. Yet, even naive use of HTML fragments breaks. I’ve given simple examples above that show the usage problems, with the constructor, with join, and the like. These all have equivalent manifestations at higher levels when attempting to use HTML as the wrapper that signals content is “ready for display in a browser”.

Pointedly, HTML seems to lack any intent / definition other than its implementation. If HTML is meant to mean that “the content is ready for presentation to a browser”, one would expect this invariant – for any x if isa(HTML, x) then HTML(x) should produce x. It’s also problematic to take any arbitrary value x and make it “ready for presentation for the browser” by printing it without specifying the text/html mimetype; that seems fundamentally unwise. The current constructor seems as if it was a stop-gap, undoubtely a useful one at some point in Julia’s history. However, to move the ecosystem forward, we need a unit of composition for HTML fragments.

Tamas_Papp · November 23, 2020, 4:11pm

I think you just need to define your API (type?) that does what you want, perhaps in a barebones package that can then be built on by packages that extend your ecosystem. I don’t think that Base.HTML was meant for this, and because of this I don’t think that punning on for your project it is a good idea.

cce · December 15, 2020, 2:39pm

So. I understand HTML better now. I also think it could be improved. Currently, it doesn’t nest. With two print methods, it could be made to nest nicely.

print(io::IO, h::HTML) = print(io, h.content)
print(io::IO, h::HTML{<:Function}) = h.content(io)

https://github.com/JuliaLang/julia/issues/38889

With this change, you get…

julia> display("text/html",  HTML(HTML("<tag/>")))
<tag/>

julia> display("text/html",  HTML("<div>", HTML("<span>text</span>"), "</div>"))
<div><span>text</span></div>

julia> display("text/html", HTML(HTML("<span>", "text", "</span>")))
<span>text</span>

julia> display("text/html", HTML("<div>", HTML("<span>", "text", "</span>"), "</div>"))
<div><span>text</span></div>

Topic		Replies	Views
Support for HTML data (not HTML strings) Web Stack	11	1908	October 19, 2019
Is there an HTML string type in Julia? General Usage	1	1339	November 23, 2017
Base.div vs Html.div - specialize or not? General Usage	29	2480	October 9, 2020
[ANN] HypertextLiteral.jl - generate tagged content with interpolation Package Announcements package , strings , webapps , pluto , html	4	1653	June 20, 2021
Julia-Gumbo-webscraping Data question	16	3699	October 24, 2019

Rationale & operations for built-in `HTML` type

Related topics