Main page 
 HTML PLAIN reference V1.1 
   1. Introduction 
  next up previous contents index
Next: 2. Specification of the Up: HTML PLAIN reference V1.1 Previous: List of Figures


1. Introduction

1.1 Problems with HTML

HTML has many advantages. It is very simple, making it easy to learn. It has many built-in style tags which are quite expressive. Linking is also readily achieved, in a simple, standard way. However, most of these advantages also have a downside, which make the maintenance of large web pages a nightmare [Arb]1.1.

1.1.1 Limited structure

Due to its fixed set of tags, which are primarily concerned with formatting, the language is inherently limited. HTML is mainly concerned about appearance and content delivery, but not about structure or content. It allows only very limited reuse and automation.

1.1.2 Limited reuse

HTML has no notion of a ``document template''. A template  allows the designer to create a fancy page and then use that layout for all web pages. However, with HTML this usually implies inserting the new set of tags into all documents. Because there are sometimes conflicts, these insertions often have to be done manually, which is very time consuming.

While HTML has style sheets , these do not solve all problems. Style sheets are a clean way to specify the appearance of the text based on its content. However, they cannot deal with the other problems that HTML causes (see below).

1.1.3 Limited linking

Because of the single standard linking mechanism, it is trivial to link to other pages or pages within the site. If these pages move, though, all links have to be updated with the new URL! This is a very tedious task, which is not always easy to automate (relative links to a document look always different depending in which directory you are). There is no way to specify a link based on a symbolic name, which allows changing the link by changing the definition of that symbol.

Because links are hard-coded, it is usually not possible to move a file to a different location, once the filename is given. Sometimes, this would help to restructure the page and keep the overview of all documents. Because it involves too much work to fix all links afterwards, this approach is usually abandoned.

Linking to external pages is also a nightmare. If a page moves, all links need to be changed.

1.1.4 Limited automation

HTML is not expressive enough to allow automation. It is hard to impossible to get information about the content, not only about the appearance, of a document. There is no way at all how to get metadata about a whole site encompassing a range of documents. Tasks like generating a site map or a list of links usually involve custom made scripts, which only work with a particular site, and/or a lot of manual work.

1.2 Common tasks for webmasters

1.2.1 Layout changes

It is not too uncommon to change the layout of all pages now and then. Usually this involves a lot of re-tagging of each document. Even tasks as simple as changing the color turn out to be a nightmare. This is all because HTML specifies the appearance completely, without any layer of abstraction. Therefore it is very hard to change the layout once a page is made.

1.2.2 Linking

One problem is to keep all links current (see Section 1.1.3). However, even then, there are many repetitive tasks left: Adding the ubiquitous ``Back'' link to a document on a higher level (or the main page), a set of links to related topics or making a site map.

1.2.3 Setting image attributes

For a faster rendering of the page, every webmaster should specify the height and width of each image. Modern HTML editors usually do this. This is one of the problems that are not dependent on other documents, and therefore it can be solved easily.

1.3 Approach to the problem

Usually the webmaster either uses a tool that helps him to make his life a little easier, or some scripts of his own, which are adapted to a particular web page and can automate some tasks well. However, most of these tools have major limitations and often do not go beyond aspects of single documents. Therefore, I decided to create a tool of my own to solve these problems.

Obviously the functionality of HTML itself had to be extended. Defining custom tags alone could not solve all the problems - accessing metadata about the pages is still not possible then. Therefore it was necessary to introduce an extra layer hiding the location of the documents and other properties such as document size or ``content level'' (see Section 2.2.6 on page [*]). Because it is not possible to hard-code all features in the HTML preprocessor, the possibility to execute macro functions has been added.

1.3.1 Precompiling HTML documents

Because changing the documents on the web server itself (via CGI) would be too slow, it is necessary to preprocess documents before they are uploaded. The preprocessor takes a template (see Section 2.2), which specifies the general appearance of all documents, and applies that template to each document that has to be uploaded. Other settings are stored in a variable file (see Section 2.3), which works in the same way as templates do, although they are concerned about content, not appearance.

The preprocessor takes all documents within a source directory tree and parses them according to all given options. The output is then stored in the upload directory where the HTML pages should be uploaded from.

1.4 Notations used

1.4.1 HTML

A ``tag'' or ``HTML tag '' is considered to be composed of the ``tag'' itself and its ``options'', i. e.

<tag ([switch]|[option="value"])*>
An ``entity '' is a special notation for some non-printable characters or string constants, and always has the form

A string within double quotes can denote a ``variable'' or ``variable name'', which is, if defined, replaced by another string, the ``content'' of the variable.

1.4.2 The software

The ``template '' is a special file which defines the appearance of all file that actually appear on the web page. The ``variable file '' is the file where the values of the variables are stored. The ``file table'' is an intermediate file which is generated in order to be able to dereference symbolic names later.

The ``documents'' are the HTML pages which actually contain the text that is later uploaded to the web server. These documents do not have to have fully complete HTML code, since the template can have much of what is required for this. The ``source '' or ``pages'' document tree contains the files that are actually edited (in the document ``root'' path). These are then processed and stored in the ``output '' or ``upload'' directory. The webmaster has then the upload these documents.

The ``preprocessor '' is the HTML preprocessor which takes the documents and produces the documents to be uploaded. The ``file table '' generator is the program which contains all the symbolic names and their meaning; it has to be run each time a new page is added.

A short text in square brackets usually means a reference to the bibliography, with the exception of Chapter 4, where it sometimes also denotes key strokes, with they key in square brackets (e. g. [Return]). The difference is always obvious.

1.4.3 Typesetting

HTML tags and entities appear in a typewriter font, as do file and program names. Longer parts of HTML or Perl code appear in an extra paragraph, while single HTML tags are usually written in the same line as the other text. Variable names and program settings appear between double quotes (like ``title'').

1.5 Other HTML preprocessors

Because the need for such a tool is definitely considerable, there are quite a few HTML preprocessors around:

  • [[Chpp]]A generic preprocessor which can also be used for HTML.
  • [[Fro]]Frontier: inspired HTML PLAIN with its file management, and also some of its syntax.
  • [[Meta]]Meta HTML: A commercial tool, intended for SQL backed web pages.
  • [[PHP]]PHP3: A very widely used embedded scripting language, including a CGI module.
  • [[QML]]QML: A simple markup language consisting entirely of assignments.
  • [[Wpp]]Very similar to HTML PLAIN, but no file management.

1.6 Where to obtain HTML PLAIN

The latest version of HTML PLAIN can be obtained at the official HTML PLAIN web page at [HPL].

next up previous contents index
Next: 2. Specification of the Up: HTML PLAIN reference V1.1 Previous: List of Figures