Norbert’s Corner

Building This Site

August 28, 2005

In creating this site, I learned quite a bit about web-related technologies such as XHTML, CSS, DOM, and Atom. I also developed a tool that keeps all pages up to date, including their calendars and tables of contents, and generates the Atom feed.


XHTML is HTML expressed in XML syntax. It seems to have become quite fashionable among web designers, but I actually have a good reason for using it: Most of the pages on my site are generated by an XML based tool, and that simply works better if its inputs and outputs are XML files. There’s been some controversy about improper use of XHTML, but since my pages are generated by a widely used XML library, I can be pretty sure that they’re well-formed XML, and I can serve them using the content type “application/xhtml+xml” to those browsers that accept such a thing. Despite being XML files, my files do not have XML declarations (the “<?xml ?>” thing) because these would prevent Internet Explorer from using standards mode, and because the files are encoded in UTF-8 and therefore don’t need the declaration. Since all style information is provided in CSS, my pages can use the strict flavor of XHTML.


I’ve long been used to separating content from presentation – style sheets were part of the original design of the Literate Programming Workshop back in 1988. The idea of using style sheets for HTML therefore seemed very natural, although the “cascading” part of Cascading Style Sheets – the complex rules for determining which properties actually apply to any given piece of content – took some getting used to. The CSS Zen Garden provided inspiration. A print style sheet adjusts content for printing, in particular removing navigation aids that wouldn’t be usable in their printed form. I didn’t quite follow the fashion for completely table-less layouts – I like my readers to be able to resize windows and fonts and the layout to adapt, and achieving that with tables is a lot easier than with the subset of CSS that’s supported in currently used browsers. Speaking of that, I had to implement quite a few workarounds for deficiencies in Internet Explorer; as Safari and Firefox now provide better and free alternatives, I’m going to pay less attention to Explorer in the future.


The name of the Atom syndication format is unfortunate, but the idea is a good one: Provide a way for a web site to advertise its updates in a way that can be easily consumed by aggregators and news trackers, and do so in a well-specified manner. RSS is better known, but comes in at least 9 different and incompatible versions, and its specifications leave a few things to be desired. This site therefore provides only an Atom feed. There’s none of those orange “XML” buttons because I can’t imagine that users want to see the gobbledygook that’s hiding behind them, and good browsers will discover the feed without them. Implementing the feed was pretty straightforward, except for one little issue: Atom knows only “published” and “updated” time stamps, and I often find the date of an event more interesting than the date of the day I write about the event (which can be quite a bit later). Fortunately, the Atom specification provides a loophole: It specifies the “published” item rather vaguely as “indicating an instant in time associated with an event early in the life cycle of the entry”, and the moment I observe something is undoubtedly early in the life cycle of an entry that I may later write about it.


All content for my site is encoded in UTF-8, a Unicode character encoding. Unicode has gone a long way towards including all characters used on planet Earth, so I can mix English with “nice typography”, Deutsch, 日本語, 中文, and any other language I might want to use, without taking the risk of data loss. Also, UTF-8 is one of only two character encodings that every XML processor must support (the other one is UTF-16, which isn’t quite as good as UTF-8 for storage in files and for transmission over networks).

Site Builder Tool

Dreamweaver is a pretty handy tool for writing web page content, but keeping the calendars and tables of contents on my web pages up to date with it, and generating the Atom feed, would be rather tedious. Programming is more fun, and so I wrote a little Java tool that does the work for me. Its input are XHTML files with just the core content of the ultimate pages, templates for the generated XHTML pages and the Atom feed, and a single XML site description file that describes the logical structure of the site, the dates associated with each page, external links, icons, and other relevant data. The tool reads all the files using the JAXP parser APIs, uses the Document Object Model APIs and other APIs such as the Calendar class to construct the final documents, and serializes the documents into XHTML and Atom files using the XML transformation APIs. The tool runs on a Mac; only the final XHTML and Atom files are sent to the web server. The only server-side logic is the determination of the appropriate content types for XHTML and Atom files.