<!DOCTYPE html example [
<!ENTITY myheader SYSTEM "myheader.html">
]>
....
&myheader;
SGML is complex, so various efforts were made to simplify HTML, and that's one of the capabilities that was dropped along the way.Strict documents, reusable types, microformats, etc. would have put search into the hands of the masses rather than kept it in Google's unique domain.
The web would have been more composible and P2P. We'd have been able to slurp first class article content, comments, contact details, factual information, addresses, etc., and built a wealth of tooling.
Google / WhatWG wanted easy to author pages (~="sloppy markup, nonstandard docs") because nobody else could "organize the web" like them if it was disorganized by default.
Once the late 2010's came to pass, Google's need for the web started to wane. They directly embed lifted facts into the search results, tried to push AMP to keep us from going to websites, etc.
Google's decisions and technologies have been designed to keep us in their funnel. Web tech has been nudged and mutated to accomplish that. It's especially easy to see when the tides change.
In addition, we used XSLT quite a bit too. It is nice being able to open your XML data files in a web browser and having it nicely formatted without any external software. All you needed was a link to the style sheet.
Elements had to be used in their pure form, and CSS was for all visual presentation.
It really helped me understand and be better at web development - getting the tick from the XHTML validator was always an achievement for complicated webpages.
The "strict markup" part can be (and always could be) had using SGML which is just a superset of XML that also supports HTML empty elements, tag inference, attribute shortforms, etc. HTML was invented as SGML vocabulary in the first place.
Agree though that Google derailed any meaningful standardization effort for the readins you stated. Actually, it started already with CSS and the idioticy to pile yet another item-value syntax over SGML/HTML, when it already has attributes for formatting. The "semantic HTML" postulate is kind of just an after-the-fact justification for insane CSS complexity that could grow because it wasn't part of HTML proper and the scrutinity that goes with introducing new elements or attributes with it.
People wanted to write and publish. Only a small portion of people/institutions would have had the resources or appetite to tag factual information on their pages. Most people would have ignored the semantic taxonomies (or just wouldn't have published at all). I guess a small and insular semantic web is better than no semantic web, but I doubt there was a scenario where the web would have been as rich as it actually became, but was also rigidly organized.
In my experience trying to work with wikidata taxonomies, it can be a total mess when it's crowdsourced, and if you go to am "expert" derived taxonomy there are all kinds of other problems with coverage, meaning, democracy.
I've had a few flirtations with the semantic web going back to 2007 and long ago came to the personal conclusion that unfortunately AI is the only viable approach.
It was such a huge improvement. For some reason rather than just tolerating old tag-soup mess while forging the way for a brighter future, we went "nah, let's embrace the mess". WTF.
It was so cool to be able to apply XML tools to the Web and have it actually work. Like getting a big present for Christmas. That was promptly thrown in a dumpster.
You could still use e.g. hReview today, but nobody does. In the end the problem of microformats was that "I want my content to be used outside my web property" is something nobody wants, beyond search engines that are supposed to drive traffic to you.
The fediverse is the only chance of reviving that concept because it basically keeps attribution around.
So any kind of purely algorithmic, metadata based retrieval algorithm would very quickly return almost pure garbage. What makes actual search engines work is the constant human work to change the algorithm in response to the people who are gaming it. Which goes against the idea of the semantic web somewhat, and completely against the idea of a local-first web search engine for the masses.
Here's what people should know.
1) The failure of XHTML was very much a multi-vendor, industry-wide affair; the problem was that the syntax of XML was stricter than the syntax of HTML, and the web was already littered with broken HTML that the browser vendors all had to implement layers of quirk handling to parse. There was simply no clear user payoff for moving to the stricter parsing rules of XML and there was basically no vendor who wanted to do the work. To my memory Google does not really stand out here, they largely avoided working on what was frequently referred to as a science project, like all the other vendors.
2) In subsequent years, Google actually has actually delivered a semantic web of sorts: https://developers.google.com/search/docs/appearance/structu...
A few things stand out as interesting. First of all, the old semantic web never had a business case. JSON+LD Structured Data does: Google will parse your structured data and use it to inform the various snippets, factoids, previews and interactive widgets they show all over their search engine and other web properties. So as a result JSON+LD has taken off massively. Millions of websites have adopted it. The data is there in the document. It is just in a JSON+LD section. If you work in SEO you know all about this. Seems to be quite rare that anyone on Hacker News is aware of it however.
Second interesting thing, why did we end up with the semantic data being in JSON in a separate section of the file? I don't know. I think everyone just found that interleaving it within the HTML was not that useful. For the legacy reasons discussed earlier, HTML is a mess. It's difficult to parse. It's overloaded with a lot of stuff. JSON is the more modern thing. It seems reasonable to me that we ended up with this implementation. Note that Google does have some level of support for other semantic data, like RDFa which I think is directly in the HTML - it is not popular.
Which brings us to the third interesting thing, the JSON+LD schemas Google uses, are standards, or at least... standard-y. The W3C is involved. Google, Yahoo, Yandex and Microsoft have made the largest contributions to my knowledge. You can read all about it on schema.org.
TL;DR - XHTML was not a practical technology and no browser or tool vendor wanted to support it. We eventually got the semantic web anyway!
There was a push to prevent browsers to be too lenient with the syntax in order to avoid the problem that sloppy HTML produced (inconsistent rendering across browsers)
Google does support multiple semantic web standards: RDFa, JSON+LD and I believe microdata as well.
JSON+LD is much simpler to extract and parse, however it makes site HTML bigger because information gets duplicated compared to RDFa where values could be inclined.
And anyway, even if Google had nefarious intentions and even if they managed to steer the standardization, one has also to concede that all search engines before Google were encumbered by too much structure, too rigid approaches. When you were looking for a book in a computerized library at that point it was standard to be sat in front of a search form with many, many fields; one for the author's name, one for the title and so forth, and searching was not only a pain, it was also very hard to do for a user without prior training. Google had demonstrated it could deliver far better results with a single short form field filled out by naive users that just plonked down three or five words that were on their mind et voila. They made it plausible that instead of imposing a structure onto data at creation time maybe it's more effective to discover associations in the data at search time (well, at indexing time really).
As for the strictness of documents, I'm not sure what it will give you what we don't get with sloppy documents. OK web browsers could refuse to display a web page if any one image tag is missing the required `alt` attribute. So now what happens, will web authors duly include alt="picture of a cat" for each picture of a cat? Maybe, to a degree, but the other 80% of alt tags will just contain some useless drivel to appease the browser. I'm actually more for strict documents than I used to be, but on the other hand we (I mean web browsers) have become quite good at reconstructing usable HTML documents from less-than perfect sources, and the reconstructed source is also a strictly validating source. So I doubt this is the missing piece; I think the semantic web failed because the idea never was strong, clear, compelling, well-defined and rewarding enough to catch on with enough people.
If we're honest, we still don't know, 25 years later, what 'semantic' means after all.
The <object> tag appears to include/embed other html pages.
An embedded HTML page:
<object data="snippet.html" width="500" height="200"></object>
Like iframe, it "includes" a full subdocument as a block element, which isn't quite what the OP is hinting at.
It was really shit. Browser navigation cues disappear, minor errors will fuck up the entire thing by navigating fixed element frames instead of contents, design flexibility disappears (even as consistent styling requires more efforts), frames don’t content-size so will clip and show scroll bars all over, debugging is absolute ass, …
And it increases resource use.
A bit of vanilla JavaScript with WebComponents is a few lines:
https://gomakethings.com/html-includes-with-web-components/
Edit: “t” was supposed to be the object tag.
You seem to have a rather original definition of "pure HTML".
An html only option that exists is using object. Replying to the miss of the OP in case others might find it suitable.
If a tiny bit of vanilla JavaScript can be tolerated, WebComponents appear to have a broad standardized approach that is not framework dependant.
I’d probably explore WebComponents, but wanting the height of JavaScript without JavaScript..
The OP doesn’t need to hint.
Some might argue react is over abstracted or over engineered to do the same.
Interpretation and preference is different than if it’s possible
Yeah, we’ve been solving this over and over in different ways. For those saying that iframes are good enough, they’re not. Iframes don’t expand to fit content. And server side solutions require a server. Why not have a simple client side method for this? I think it’s a valid question. Now that we’re fixing a lot of the irritation in web development, it seems worth considering.
My dialup ISP back then didn't disable using .htaccess files in the web space they provided to end users. That meant I could turn on server-side includes! Later I figured out how to enable CGI. (I even went so far as to code rudimentary webshells in Perl just so I could explore the webserver box...)
A small 10KB lib that augments HTML with the essential good stuff (like dynamic imports of static HTML)
<script>
function includeHTML(url) {
const s = document.currentScript
fetch(url).then(r => r.text()).then(h => {
s.insertAdjacentHTML('beforebegin', h)
s.remove()
})
}
</script>
... <script>
includeHTML('/footer.html')
</script>
The `script` element is replaced with the html from `/footer.html`.there are many examples of HTMX (since it is a self contained and tiny) being used alongside existing frameworks
of course for some of us, since HTMX brings dynamic UX to back end frameworks, it is a way of life https://harcstack.org (warning - raku code may hurt your eyes)
Depending on the specific objection to Javascript, this may or may not matter:
1. You object to any/all JS on a page? Yeah, then this won't work for you.
2. You object to having to write JS just to get client-side includes? This should mostly work for you.
It all depends on what the actual objection is.
$ curl --location --silent "https://unpkg.com/htmx.org@2.0.4" | wc -c
50917
$ curl --location --silent "https://unpkg.com/htmx.org@2.0.4" | gzip --best --stdout | wc -c
16314
Actually, that was part of the original plan - https://caniuse.com/iframe-seamless
It worked rather like a reverse shadow DOM, allowing CSS from the parent document to leak into the child, removing borders and other visual chrome that would make it distinguishable from the host, except you still had to use fixed CSS layouts and resize it with JS.
This helps the creator, but not the consumer, right? That is, if I visit 100 of your static documents created with a template engine, then I'll still be downloading some identical content 100 times.
That doesn't seem like a significant problem at all, on the consumer side.
What is this identical content across 100 different pages? Page header, footer, sidebar? The text content of those should be small relative to the unique page content, so who cares?
Usually most of the weight is images, scripts and CSS, and those don't need to be duplicated.
If the common text content is large for some reason, put the small dynamic part in an iframe, or swap it out with javascript.
If anyone has a genuine example of a site where redundant HTML content across multiple pages caused significant bloat, I'd be interested to hear about it.
To give you a concrete example, consider caching (or, equivalently, compiling) web pages. Maybe you have 100 articles, which share a common header and footer. If you make a change to the header, then all 100 articles have to be uncached/rebuilt. Why? Because somebody did not remove the duplication when they had the chance :-)
[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Com...
On the other hand it means less work for the client, which is a pretty big deal on mobile.
XSLT did exactly what HTML includes could do and more. The user agent could cache stylesheets or if it wanted override a linked stylesheet (like with CSS) and transform the raw data any way it wanted.
While it evaluated the xslt serverside it was a really neat and simple approach.
https://developer.mozilla.org/en-US/docs/Web/API/Service_Wor...
I still remember the script I wrote to replace thousands (literally) slightly different headers and footers in some large websites of the 90s. How liberating to finally have that.
We’ve built an industry around solving this problem. What if, for some basic web publishing use cases, we could replace a complex web framework with one new tag?
I actually did that replacement, with a few enhancements (maybe 100 lines of code, total?). It's in arxiv pending at the moment. In about two days it will be done and I'll post a Show HN here.
<div src="foo.txt"></div>
> XHTML 2 takes a completely different approach, by taking the premise that all images have a long description and treating the image and the text as equivalents. In XHTML 2 any element may have a @src attribute, which specifies a resource (such as an image) to load instead of the element.
Like writing a line of js?
If internally this gets optimized to a simple I/O operation (which it should) then why add the JS indirection in the first place?
I do it in a way that doesn't stop the renderer.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html>
<frameset cols="1000, *">
<frame src="FRAMESET_navigation.html" name="navigation">
<frame src="FRAMESET_home.html" name="in">
</frameset>
</html>
The thing that always bugged me about frames is that they are too clever. I don't want to reload only the frame html when I rightclick and reload. Sure the idea was to cache those separately, but come on — frames and caching are meant to solve two different problems and by munching them together they somewhat sucked at solving either.To me includes for HTML should work in the dumbest way possible. And that means: Take the text from the include and paste it where the include was and give the browser the resulting text.
If you want to cache a nav section separately because it appears the same on every page lets add a cache attribute that solves the problem independently:
<nav cache-id="deadbeefnav666">
<some-content></etc>
</nav>
To tell the browser it should load the inner html or the src of that element from cache if it has it.Now you could convince me thst the include should allow for more, but it being dumb is a feature not a bug.
I’m sure some purist argument has driven this somewhere.
.container .style { … }
Where the container is basically the whole guest document but you still want those rules to apply…. Maybe, you want the guest text to appear in the same font as the host document but you still want colors and font weights to apply. Maybe you want to make the colors muted to be consistent with the host document, maybe the background of the host document is different and the guest text isn’t contrasts enough anymore, etc.All sorts of data could be linked together to display or remix by user agents.
Markdown can't do most of those, so it makes more sense why it doesn't have includes, but I'd still argue it definitely should. I generally dislike LaTeX, but about the only thing I liked about it when writing my thesis was that I could have each chapter in its own file and just include all of them in the main file.
As I wrote that, I realized there could be cumulative layout shift, so that’s an argument against. To avoid that, the browser would have to download all transcluded content before rendering. In the past, this would have been a dealbreaker, but maybe it’s more feasible now with http multiplexing.
[0] https://en.m.wikipedia.org/wiki/Transclusion#Client-side_HTM...
https://docs.asciidoctor.org/asciidoc/latest/directives/incl...
I'm not defending it, because when I started web development this was one of the first problems I ran into as well -- how the heck do you include a common header.
But the original concept of HTML was standalone documents, not websites with reusable components like headers and footers and navbars.
That being said, I still don't understand why then the frames monstrosity was invented, rather than a basic include. To save on bandwidth or something?
A link in a sidebar frame would open a link in the "editor" frame which loaded a page with a normal HTML form. Submitting the form reloaded it in that same frame. Often the form would have multiple submit buttons, one to save edits in progress and another to submit the completed form and move to the next step. The current app state was maintained server side and validation was often handled there save for some basic formatting client side JavaScript could handle.
This setup allowed even the most primitive frame-supporting browsers to use CRUD web apps. IIRC early web frameworks like WebObjects leaned into that model of web app.
They were horrible -- you'd hit the back button and only one of the frames would go back and then the app would be in an inconsistent state... it was a mess!
I don't love JavaScript monstrosities but XHR and dynamic HTML were a vast improvement over HTML forms and frame/iframe abuse.
Try some other architecture though and all bets were off.
Amazon's web store looked and worked mostly the same as it does now, people were very impressed with MapQuest, etc.
Applications like that can feel really fast, almost desktop application fast, if you are running them on a powerful desktop computer and viewing them on another computer or tablet over a LAN
Lots of gateways between systems.
It’s made to pull in external resources (as opposed to other document formats like PDF).
Scripts, stylesheets, images, objects, favicons, etc. HTML is thematically similar.
The only one that is presentational is stylesheets.
Something in that tenet does not compute with me.
> HTML Imports are a way to include and reuse HTML documents in other HTML documents
There were plans for <template> tag support and everything.
If I remember correctly, Google implemented the proposed spec in Blink but everyone else balked for various reasons. Mozilla was concerned with the complexity of the implementation and its security implications, as well as the overlap with ES6 modules. Without vendor support, the proposal was officially discontinued.
The thing is that all those are non-reasons that don't really explain anything: Low demand is hard to believe if this feature is requested for 20 years straight and there are all kinds of shim implementations using scripts, backend engines, etc. (And low demand didn't stop other features that the vendors were interested in for their own reasons)
Vendor refusal also doesn't explain why they refused it, even to the point of rolling back implementations that already existed.
So I'd be interested to understand the "various reasons" in more detail.
"Security implications" also seem odd as you already are perfectly able to import HTML cross origin using script tags. Why is importing a script that does document.write() fine, but a HTML tag that does exactly the same thing hugely problematic?
(I understand the security concern that you wouldn't want to allow something like "<import src=google.com>" and get an instant clone of the Google homepage. But that issue seems trivially solvable with CORS.)
[1] https://frontendmasters.com/blog/seeking-an-answer-why-cant-...
There are various specs/semantics you can choose, which prescribe the implementation & required cross-cutting complexity. Security is only relevant in some of them.
To give you some idea:
- HTML load ordering is a pretty deeply held assumption. People understand JS can change those assumptions (document.write). Adding an obscure HTML tags that does so is going to be an endless parade of bugs & edge cases.
- To keep top-to-bottom fast we could define preload semantics (Dropping the linear req-reply, define client-cache update policy when the template changes, etc). Is that added complexity truly simpler than having the server combine templates?
- <iframe> exists
In other words, to do the simplest thing 75% of people want, requires a few lines of code. Either client side or server side.
To fit the other 25% (even to 'deny' it) is endlessly complex in ways few if any can oversee.
See https://github.com/whatwg/html/issues/2791#issuecomment-3112... for details.
https://web.archive.org/web/19970630074729fw_/http://develop...
https://web.archive.org/web/19970630094813fw_/http://develop...
https://en.wikipedia.org/wiki/Transclusion
It was part of Project Xanadu, and originally considered to be an important feature of hypertext.
Notably, mediawiki uses transclusion extensively. It sometimes feels like the wiki is the truest form of hypertext.
In Xanadu you could transclude just an excerpt from one document into another document.
If you wanted to do this with HTML you need an answer for the CSS. In any particular case you can solve it, making judgements about which attributes should be consistent between the host document, the guest document and the guest-embedded-in-host. The general case, however, is unclear.
For a straightforward <include ...> tag the guest document is engineered to live inside the CSS environment (descendant of the 3rd div child of a p that has class ".rodney") that the host puts it in.
Another straightforward answer is the Shadow DOM which, for the most part, lets the guest style itself without affecting the rest of the document. I think in that case the host can still put some styles in to patch the guest.
it never quite took off
There was a lot of criticism for frames [1] but still they were successfully deployed for useful stuff like Java API documentation [2].
In my opinion the whole thing didn't stay mostly because of too little flexibility for designer: Framesets were probably well enough for useful information pages but didn't account for all the designers' needs with their bulky scrollbars and limited number of subspaces on the screen. Today it is too late to revive them because framesets as-is wouldn't probably work well on mobile...
[1] <https://www.nngroup.com/articles/why-frames-suck-most-of-the...> - I love how much of it is not applicable anymore and all of these problems mentioned with frames are present in today's web in an even nastier way?
[2] <https://www.eeng.dcu.ie/~ee553/ee402notes/html/figures/JavaD...>
Of course "back then" this was an important feature and one of the reasons for getting rid of frames :)
As the article says, the problem is a solved one. The "includes" issue is how every web design student learns about PHP. In most CMSes, "includes" become "template partials" and are one of the first things explained in the documentation.
There really isn't any need to make includes available through just HTML. HTML is a presentation format and doesn't do anything interesting without CSS and JS anyway.
That's not an argument that client-side includes shouldn't happen. In fact HTML already has worse versions of this via frames and iframes. A client-side equivalent of a server-side include fits naturally into what people do with HTML.
Some content is already loaded asynchronously such as images, content below the fold etc.
> HTML is really just a markup syntax, not a programming language
flamebait detected :) It's a declarative language, interpreted by each browser engine separately.
<html src="/some/page.html">, <div src="/some/div.html">, <span src="/some/span.html">, etc.
Or create a new tag that's a noun like fragment, page, document, subdoc or something.
Surely that's no less markup than svg, img, script, video, iframe, and what not.
Nor is this something unique to SGML. XML is also a "markup language", yet XInclude is a thing.
touchay!!
> considered to be server-side
Good point! Wouldn't fetching a template partial happen the same way (like fetching an image?)
I always assumed it stood for my initials.
It's "solved" only in the sense that you need to use a programming language on the server to "solve" it. If all you are doing is static pages, it's most definitely not solved.
That's "using a programming language to solve the problem", isn't it?
> It's way cheaper as you do the work once, instead of every client being much slower because they have to do additional requests.
What work do client-side includes have to do other than fetching the page (which will get cached anyway)? It's less work to have a `<include-remote ...>` builtin than even a simple Makefile on the server.
Fetching another resource is expensive. It's another round trip, and depending on many factors it could be another second to load the page. And if the HTML includes other nested HTML then it can be much slower.
This is the exact thing we try to avoid when building websites that perform well. You want as few chained requests as possible, and you want the browser to be aware of them as soon as possible, with the correct priority. That way the browser can get the important stuff needed to display content fast.
Including HTML client side for templating is just wasteful, slow and dumb from a technical standpoint.
Every client would have to do another request for each include. It would literally be many thousands of times slower(or worse) than doing it locally where the templates can be in memory as you render the pre-render the pages. You also save a ton of CPU cycles and bandwidth, by not serving more files with additional overhead like headers.
Yeah, it's not. I'm doing client side includes and the includes get cached by the browser. I'm sure I would have noticed if my pages went from 1s to display to 1000s to display.
If you have a site/webapp with (say) twenty pages, that's only two extra requests for both header and footer.
An additional request is another round trip. That can be very slow. Average TTFB on the internet in the US is ~0.7 seconds.
It's much faster to send it as part of the same request as you then don't have to wait for the browser to discover it, request it, wait for the response and then add it.
A build process does not have to be complicated, at all. If you can write HTML then using something that can simply read the HTML includes you wish existed and swap it with the specified filename is trivial.
Ofc, the idea has many other issues, like how to handle dependencies of the included HTML, how to handle conflicts, what oath to use and many more.
One drawback though would be that one indeed would have to maintain dependencies, which would be error prone beyond simply adding headers and footers... I wonder if one could (ab)use CPP [1] and its -M option to do that.
HTML is a markup language that identifies the functional role of bits of text. In that sense, it is there to provide information about how to present the text, and is thus a presentation format.
It is also a document description language, because almost all document description languages are also a presentation format.
Exactly! Include makes perfect sense on server-side.
But client-side include means that the client should be able to modify original DOM at unknown moment of time. Options are
1. at HTML parse time (before even DOM is generated). This requires synchronous request to server for the inclusion. Not desirable.
2. after DOM creation: <include src=""> (or whatever) needs to appear in the DOM, chunk loaded asynchronously and then the <include> DOM element(sic!) needs to be replaced(or how?) by external fragment. This disables any existing DOM structure validation mechanism.
Having said that...
I've implemented <include> in my Sciter engine using strategy #1. It works there as HTML in Sciter usually comes from local app resources / file system where price of issuing additional "get chunk" request is negligible.
> the problem is a solved one
is a sure-fire way to know that a problem is not solved
If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
You could do the opposite, you can have article1.html, article2.html, article3.html etc, each include header.html, footer.html, navi.html. Ok, that works, but now you've make it so making a global change to the structure of your articles requires editing all articles. In other words, if you want to add comments.html to every article you have to edit all articles and you're back to wanting to generate pages from articles based on some template at which point you don't need the browser to support include.
I also suspect there would be other issues, like the header wants to know the title, or the footer wants a next/prev link, which now require some way to communicate this info between includes and you're basically back to generate the pages and include not being a solution
I think if you work though the issues you'll find an HTML include would be practically useless for most use cases.
> If main.html includes child/include1.html and child/include1.html has a link src="include2.html" then when the user clicks the link where does it go? If it goes to "include2.html", which by the name was meant to be included, then that page is going to be missing everything else. If it goes to main.html, how does it specify this time, use include2.html, not include1.html?
There are two distinct use cases here: snippet reuse and embeddable self-contained islands. But the latter is already handled by iframes (the behavior being your latter case). So we only need to do the former.
No, they are a can of worms and decades of arguments and incompatibilities and versioning
> But the latter is already handled by iframes
iframes don't handle this case because the page can not adjust to the iframe's content. There have been proposals to fix this but they always run into issues.
https://github.com/domenic/cooperatively-sized-iframes/issue...
If a user clicked a link with src="include.css" then it'll be rubbish.
It would be good for static data.. images, css, and static html content.
Client side include feature for HTML
The actual term include is an XML feature and it’s that feature the article is hoping for. HTML had an alternate approach that came into existence before XML. That approach was frames. Frames did much more than XML includes and so HTML never gained that feature. Frames lost favor due to misuse, security, accessibility, and variety of other concerns.
I still like to use them occasionally but it incurs a "compilation" step to evaluate them prior to handing the result of this compilation to the users/browsers.
<!-- fizz.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="application/xslt+xml" href="style.xslt"?>
<fizz>Fizz<buzz/></fizz>
<!-- style.xslt -->
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="buzz">
<xsl:value-of select="document('buzz.xml')"/>
</xsl:template>
</xsl:stylesheet>
<!-- buzz.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<buzz>Buzz</buzz>
You can even use XSLT for HTML5 output, if you're careful. But YMMV with which XML processors will support stylesheets.XML includes are blocking because XSL support hasn't been updated for 25 years, but there's no reason why we couldn't have it async by now if resources were devoted to this instead of webusb etc.
You'd better not jinx it: XSL support seems like just the sort of thing browser devs would want to tear out in the name of reducing attack surface. They already dislike the better-known SVG and never add any new features to it. I often worry that the status quo persists only because they haven't really thought about it in the last 20 years.
I’ve used XSLT in anger - I used it to build Excel worksheets (in XML format) using libXSLT. I found it very verbose and hard to read. And Xpath is pretty torturous.
I wish I could have used Javascript. I wish Office objects were halfway as easy to compose as the DOM. I know a lot of people hate on Javascript and the DOM, but it’s way easier to work with than the alternatives.
-- index.html
<html>
<body>
<script src="header.js"></script>
<main>
<h1>Hello includes</h1>
</main>
<script src="footer.js"></script>
</body>
</html>
-- header.js
document.currentScript.outerHTML = `
<header>
<h1>Header</h1>
</header>`
-- footer.js
document.currentScript.outerHTML = `
<footer>
<h1>Footer</h1>
</footer>`
Scripts will replace their tags with html producing a clean source, not pretty but it works on the client<script src="/include/footer.html">
For /footer.html
But then you probably might as well use server side includes
I personally used this to great success on a couple of Premier League football club websites around the mid 2000s.
In server terms the overhead of tracking one download is going to be less that the overhead of tracking the download of the multiple components
And for client side caching to be any use then a visitor would need to view more than one page and the harsh reality is many sessions are only one page long e.g. news sites, blogs etc
Nodes can be addressed individually, but a document is the proportion for transmission containing also metadata. You can combined nodes as you like, but you can't really combined two already packed and annotated documents of nodes.
So I would say it is more due a semantic meaning. I think there was also the idea of requesting arbitrary sets of nodes, but that was never developed and with the shift away from a semantic document, it didn't make sense anymore.
Maybe a single tag that points at an url to load if someone attempts to load the chunk directly.
More or less, but manipulating the nodes requires JavaScript, which some people would like to avoid.
There are features that would be good for the latter that have been removed. For example, if you need to embed HTML code examples, you can use the <xmp> tag, which makes it so you don't need to encode escapes. Sadly, the HTML5 spec is trying to obsolete the <xmp> tag even though it's the only way to make this work. All browsers seem to be supporting it anyways, but once it is removed you will always have to encode the examples.
HTML spec developers should be more careful to consider people hand coding HTML when designing specifications, or at least decisions that will require JavaScript to accomplish something it probably shouldn't be needed for.
- JS for functionality via the custom elements API - HTML for layout via <template> tags. - CSS for aesthetics via <style> tags.
Not for just quickly and simply inserting the contents of header.html at a specific location in the DOM.
This feature was billed as #includes for the web [1]. No, it acts nothing like an #include. TBH I don't see why ES modules are a "replacement" here.
Personally I would like to see something like these imports come back, as a way to reuse HTML structure across pages, BUT purely declaratively (no JS needed).
#includes where partially formed HTML (ie, header.html has a <body> open tag and footer.html has the closing tag) isn't very DOM compatible.
[1] https://web.archive.org/web/20181121181125/https://www.html5...
I would have loved for there to be a json based format, or perhaps yaml, as an alternative to the xml- based stuff we have today.
How well supported is XSLT in modern browsers? What would be the drawbacks of using this approach for a modern website?
It's so bad, that if you want to discuss the markup hypertext (I.E. putting notes on top of an existing read only text files, etc.) you'll have to Google the word "annotation" to even start to get close.
Along with C macros, Case Sensitivity, Null terminated strings, unauthenticated email, ambient authority operating systems, HTML is one of the major mistakes of computing.
We should have had the Memex at least a decade ago, and we've got this crap instead. 8(
<ul>
<li><a href="about.html" target="display">about</a></li>
<li><a href="contact.html" target="display">contact</a></li>
</ul>
<iframe src="about.html" name="display"></iframe>
The important part is that the target iframe must have a `name` attribute (not identified by `id`.) I guess, this is a legacy of framesets & frames.(Of course, this has all the issues of framesets, as in deep linking, accessibility, etc.)
You have to give an iframe a specific height in pixels. There is no “make this iframe the height its content wants to be (like normal HTML).
This leads to two options:
- your page has nested vertical scroll bars (awful UX) - you have to write JavaScript inside and outside the frame to constantly measure and communicate how tall the frame wants to be.
Or you could just not use frames.
That may be a fairly specific use case though, and largely it still works great today. I've done a few side projects with XSLT and web components for interactivity, worked great.
Here we go, looks like its 17 years old now:
> The only combination that fails to render these entities correctly is Firefox/XSLT.
Which is one good reason not to adopt XSLT to implement HTML includes. You just don't know what snags you'll hit upon but you can be sure you'll be on your own.
> Bug 98168 (doe) Opened 24 years ago Updated 21 days ago
Well it does look like someone's still mulling over whether and how to fix it... 24 years later...
Debugging is also pretty painful, or I at least haven't found a good dev setup for it.
That said, I'm happy to reach for XSLT when it makes sense. Its pretty amazing what can be done with such an old tech, for the core use case of props and templates to HTML you really don't need react.
Yes, but in regards to HTML it hasn't been solved in a standard way, it's been solved in hundreds, if not thousands of non standard ways. The point of the article is that having one standard way wlcould reduce a lot of complexity from the ecosystem, as ES6 imports did.
What does this mean? This is a pure HTML solution, not just "technically" but in reality. (And before iframe there were frames and frameset). Just because the author doesn't like them don't make them non-existent.
An iframe is a window into another webpage, and is bounded as such both visually and in terms of DOM interfaces. A simple example would be that an iframe header can't have drop-down menus that overlap content from the page hosting it.
They are categorically not the same DX/UX as SSI et al. and it's absolutely bizarre to me that there's so many comments making this complaint.
They would be a lot more useful if we could write e.g. <iframe src=abc.html height=auto width=100> so the height of the iframe element is set by the abc.html document instead of the parent document.
Then you need a postMessage to send body size to parent frame which then needs to listen for messages and resize the iframe element.
https://www.w3.org/TR/WD-html40-970708/struct/includes.html#...
The "extremely awkward" aspect they complain about is a side effect of needing to handle that case.
You could add some nicer way to include content for the same domain, but I suspect having two highly similar HTML features would be fairly awkward in practice, as you'd have to create a whole new set of security rules for it.
The logic is performed elsewhere. If you were to have includes directly in HTML, it means that browsers must implement logic for HTML. So it is not 'just' a parser anymore.
Imagine for example that I create an infinite loop of includes, who is responsible to limit me? How to ensure that all other browsers implement it in the same way?
What happens if I perform an injection from another website? Then we start to have cors policy management to write. (iframes were bad for this)
Now imagine using Javascript I inject an include somewhere, should the website reload in some way? So we have a dynamic DOM in HTML?
Client-side includes are not "processing". HTML already has frames and iframes which do this, just in a worse way, so we'd be better off.
(yes in my view I interpret includes as a basic procedure)
[1] http://www.info.ucl.ac.be/people/PVR/paradigmsDIAGRAMeng201....
We can probably copy the specs for <frameset> and deal with it the same way:
https://www.w3.org/TR/WD-frames-970331#:~:text=Infinite%20Re...
Any frame that attempts to assign as its SRC a URL used by any of its ancestors is treated as if it has no SRC URL at all (basically a blank frame).
> How to ensure that all other browsers implement it in the same way?Browsers that don't implement the specs will eventually break:
* the actual existence of frames (although those are deprecated)
* iframes (which are not deprecated, so seemingly doing declarative inclusion of HTML in HTML was not what was wrong with frames)
* imports in CSS, which share some of the same problems / concerns as HTML imports
* the existence of JavaScript with its ability to change anything on the page, including the ability to issue HTTP requests and be written arbitrarily obfuscated ways.
In the end, I settled on using a Caddy directive to do it. It still feels like a tacked on solution, but this is about as pure as I can get to just automatically "pasting" in the code, as described in the article.
I’ve always just styled the link to the current page differently, not disabled it, which you can do with an id on the page and a line of CSS.
Includes are a standard part of many document systems. Headers and footers are a perfect example - if I update a document I certainly don't want to update the document revision number on every single page! It also allows you to add navigation between documents in a way that is easy to maintain.
LaTeX can do it. Microsoft Word can do it (in a typically horrible Microsoftian way). Why not HTML?
Interesting, my brain is not this way: I want to send a minimum number of files per link requested. I don't care if I include the same text because the web is generally slow and it's generally caused by a zillion files sent and a ton of JS.
Also I miss framesets - with that a proper sidebar navigation was easily possible.
I’m not saying my first website was impressive — but as a programmer there’s no way I was copying and pasting the same header / footer stuff into each page and quickly found “shtml” and used that as much as possible.
Then used the integrated FTP support in whatever editor it was (“HTML-kit” I think it was called?) - to upload it straight to prod. Like a true professional cowboy.
I wish I would have advocated more for it though. I think it would be pretty easy to add using a new attribute on <script> since the parser already pauses there, so making something like <script transclude={url}> would likely not be too difficult.
From what I remember, the main problem was that it broke URLs: you could only link to the initial state of the page, and navigating around the site wouldn't update the address bar - so deep linking wasn’t possible (early JavaScript SPA frameworks had the same issue, BTW). Another related problem was that each subframe had to be a full HTML document, so they did have their own individual URLs. These would get indexed by search engines, and users could end up on isolated subframe documents without the surrounding context the site creator intended - like just the footer, or the article content without any navigation.
SSI is still a thing: I use it on my personal website. It isn't really part of the HTML, though: it's a server-dependent extension to HTML. It's supported by Apache and nginx, but not by every server, so you have to have control over the server stack, not just access to the documents.
Originally, iframe were the solution, like the posts mentions. By the time iframes became unfashionable, nobody was writing HTML with their bare hands anymore. Since then, people use a myriad of other tools and, as also mentioned, they all have a way to fix this.
So the only group who would benefit from a better iframe is the group of people who don't use any tools and write their HTML with their bare hands in 2025. That is an astonishing small group. Even if you use a script to convert markdown files to blog posts, you already fall outside of it.
No-one needs it, so the iframe does not get reinvented.
[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
Flash's reputation was quite low at the time and people were ready to finally move on from plugins being required on the web. (Though the "battle" then shifted to open vs. closed codecs.)
https://miragecraft.com/blog/replacing-html-imports
At the end of the day it’s not something trivial to implement at the HTML spec/parser level.
For relative links, how should the page doing the import handle them?
Do nothing and let it break, convert to absolute links, or remap it as a new relative link?
Should the include be done synchronously or asynchronously?
The big benefit of traditional server side includes is that its synchronous, thus simplifying logic for in-page JavaScript, but all browsers are trying to eliminate synchronous calls for speed, it’s hard to see them agreeing to add a new synchronous bottleneck.
Should it be CORS restricted? If it is then it blocks offline use (file:// protocol) which really kills its utility.
There are a lot of hurdles to it and it’s hard to get people to agree on the exact implementation, it might be best to leave it to JavaScript libraries.
I still feel like frames were great for their use case.
The first naysayer was @dominic: https://github.com/whatwg/html/issues/2791#issuecomment-3113...
> I don't think we should do this. The user experience is much better if such inclusion is done server-side ahead of time, instead of at runtime. Otherwise, you can emulate it with JavaScript, if you value developer convenience more than user experience.
The "user experience" problem he's describing is a performance problem, adding an extra round-trip to the server to fetch the included HTML. If you request "/blog/article.html", and the article includes "/blog/header.html", you'll have to do another request to the server to fetch the header.
It would also prevent streaming parsing and rendering, where the browser can parse and render HTML bit-by-bit as it streams in from the server.
Before you say, "so, what's the big deal with adding another round trip and breaking the streaming parser?" go ahead and read through the hundreds of comments on that thread. "What's the big deal" has not convinced browser devs for at least eight years, so, pick another argument.
I think there is a narrow opening, where some noble volunteer would spec out a streaming document-fragment parser.
It would involve a lot of complicated technical specification detail. I know a thing or two about browser implementation and specification writing, and designing a streaming document-fragment parser is far, far beyond my ken.
But, if you care about this, that's where you'd start. Good luck!
P.S. There is another option available to you: it is kinda possible to do client-side includes using a service worker. A service worker is a client-side proxy server that the browser will talk to when requesting documents; the service worker can fetch document fragments and merge them together (even streaming fragments!) with just a bit of JS.
But that option kinda sucks as a developer experience, because the service worker doesn't work the first time a user visits your site, so you'd have to implement server-side includes and also serve up document fragments, just for second-time visitors who already have the header cached.
Still, if all you want is to return a fast cached header while the body of your page loads, service workers are a fantastic solution to that problem.
(and this https://harcstack.org)
It should not have gone away. It never did for me.
Also, this is kind of what 'frames' were and how they were used. Everything old is new again.
(and this https://harcstack.org)
Github: https://github.com/franzenzenhofer/html-include-polyfill-ext...
<!--#include virtual="header.html" -->
Some content here
<!--#include virtual="footer.html" -->
[0]: https://www.gnu.org/software/emacs/manual/html_node/emacs/in...
An embedded HTML page:
<object data="snippet.html" width="500" height="200"></object>
[edit: I'm sure there are still some file:// workflows for docs - and yes this doesn't address that]
I have built several sites with pure HTML+CSS, sprinkled with some light SSI with Caddy, and it is rock solid and very performant!
It was not in the early spec, and seems someone powerful wouldn't allow it in later. So everyone else made work arounds, in any way they could. Resulting in the need being lessened quite a bit.
My current best workaround is the <object data=".."> tag, which has a few better defaults than iframe. If you put a link to the same stylesheet in the include file it will match pretty well. Size with width=100%, though with height you'll need to eyeball or use javascript.
Or, Javascript can also hoist the elements to the document level if you really need. Sample code at this site: https://www.filamentgroup.com/lab/html-includes/
And it is a genuinely good question!
I think the answer of PD says feels the truest.
JS/CSS with all its bureaucracy are nothing compared to HTML it seems. Maybe people don't find nothing wrong with Html, maybe if they do, they just reach out for js/css and try to fix html (ahem frontend frameworks).
That being said, I have just regurgitated what PD says has said and I give him full credit of that but I am also genuinely confused as to why I have heard that JS / CSS are bureaucratic (I remember that there was this fireship video of types being added in JS and I think I had watched it atleast 1 year ago (can be wrong) but I haven't heard anything for it and I see a lot of JS proposals just stuck from my observation
And yet HTML is such level of bureaucratic that the answer to why HTML doesn't have a feature is because of its bureaucracy. Maybe someone can explain the history of it and why?
This post does link to a technique (new to me) to extract iframe contents:
<iframe src="/example.html" onload="this.before((this.contentDocument.body||this.contentDocument).children[0]);this.remove()"></iframe>
You can do some really silly maneuvers with `window.postMessage` to communicate an expected size between the parent and frame on resize, but that's expensive and fiddly.
The host can then act as a server for the iframe client, even updating it's state or DOM in response to a message from the iframe.
It wouldn't make sense to transclude the article about the United States in the article about Wyoming (and in fact modern wikipedia shows a pop up bubble doing a partial transclusion, but would benefit in no way from basic html transclusion.)
It's a simple idea. But of course modern HTML is not at all what HTML was designed to be, but that's the canonical answer.
The elders of HTML would just tell you to make an <a> link to whatever you wanted to transclude instead. Be it a "footer/header/table of contents" or another encylcopdic article, or whatever. Because that's how HTML works, and not the way you suggest.
Think of what would happen if it were the case, you would transclude page A, which transcludes page B, and so with page C, possibly recursively transcluding page B and so. You would transform the User Agent (browser) into a whole WWW crawler!
It's because HTML is pass by reference, not pass by copy.
Asking for things that the W3C had specced out in 2006 for XML tech is just not reasonable if it doesn't facilitate clicks.
> We’ve got <iframe>, which technically is a pure HTML solution, but
And then on the following paragraph..
> But none of the solutions is HTML
> None of these are a straightforward HTML tag
Not sure what the point is. Maybe just complaining
"We’ve got <iframe>, which technically is a pure HTML solution, but they are bad for overall performance, accessibility, and generally extremely awkward here"
If you disagree, and you think you are in the right, you probably have a somewhat good argument you can use in a reply.
The fact that you don't means my explanation makes sense.
customElements.define('html-import', class extends HTMLElement {
connectedCallback() {
const href = this.getAttribute('href')
const fetch = new XMLHttpRequest()
fetch.responseType = 'document'
fetch.addEventListener('readystatechange', (function onfetch(e) {
if (fetch.readyState !== XMLHttpRequest.DONE) return
const document = fetch.response.querySelector('body') ?? fetch.response
this.replaceWith(document)
}).bind(this))
fetch.open('GET', href)
fetch.send()
}
})
customElements.define("include", class extends HTMLElement { connectedCallback() { fetch(this.getAttribute("href")).then(x => x.text()).then(x => this.outerHTML = x) } })