Next: 2. Specification of the
Up: HTML PLAIN reference V1.1
Previous: List of Figures
Subsections
HTML has many advantages. It is very simple, making it easy to learn. It has
many built-in style tags which are quite expressive. Linking is also readily
achieved, in a simple, standard way. However, most of these advantages also
have a downside, which make the maintenance of large web pages a nightmare [Arb]1.1.
Due to its fixed set of tags, which are primarily concerned with formatting,
the language is inherently limited. HTML is mainly concerned about appearance
and content delivery, but not about structure or content. It allows only very
limited reuse and automation.
HTML has no notion of a ``document template''. A template
allows the designer to create a fancy page and then use that layout for all
web pages. However, with HTML this usually implies inserting the new set of
tags into all documents. Because there are sometimes conflicts, these insertions
often have to be done manually, which is very time consuming.
While HTML has style sheets , these do not solve all problems.
Style sheets are a clean way to specify the appearance of the text based on
its content. However, they cannot deal with the other problems that HTML causes
(see below).
1.1.3 Limited linking
Because of the single standard linking mechanism, it is trivial to link to other
pages or pages within the site. If these pages move, though, all links have
to be updated with the new URL! This is a very tedious task, which is not always
easy to automate (relative links to a document look always different depending
in which directory you are). There is no way to specify a link based on a symbolic
name, which allows changing the link by changing the definition of that symbol.
Because links are hard-coded, it is usually not possible to move a file to a
different location, once the filename is given. Sometimes, this would help to
restructure the page and keep the overview of all documents. Because it involves
too much work to fix all links afterwards, this approach is usually abandoned.
Linking to external pages is also a nightmare. If a page moves, all links need
to be changed.
HTML is not expressive enough to allow automation. It is hard to impossible
to get information about the content, not only about the appearance, of a document.
There is no way at all how to get metadata about a whole site encompassing a
range of documents. Tasks like generating a site map or a list of links usually
involve custom made scripts, which only work with a particular site, and/or
a lot of manual work.
It is not too uncommon to change the layout of all pages now and then. Usually
this involves a lot of re-tagging of each document. Even tasks as simple as
changing the color turn out to be a nightmare. This is all because HTML specifies
the appearance completely, without any layer of abstraction. Therefore it is
very hard to change the layout once a page is made.
One problem is to keep all links current (see Section 1.1.3).
However, even then, there are many repetitive tasks left: Adding the ubiquitous
``Back'' link to a document on a higher level (or the main page), a set of
links to related topics or making a site map.
For a faster rendering of the page, every webmaster should specify the height
and width of each image. Modern HTML editors usually do this. This is one of
the problems that are not dependent on other documents, and therefore it can
be solved easily.
Usually the webmaster either uses a tool that helps him to make his life a little
easier, or some scripts of his own, which are adapted to a particular web page
and can automate some tasks well. However, most of these tools have major limitations
and often do not go beyond aspects of single documents. Therefore, I decided
to create a tool of my own to solve these problems.
Obviously the functionality of HTML itself had to be extended. Defining custom
tags alone could not solve all the problems - accessing metadata about the pages
is still not possible then. Therefore it was necessary to introduce an extra
layer hiding the location of the documents and other properties such as document
size or ``content level'' (see Section 2.2.6 on page
).
Because it is not possible to hard-code all features in the HTML preprocessor,
the possibility to execute macro functions has been added.
Because changing the documents on the web server itself (via CGI) would be too
slow, it is necessary to preprocess documents before they are uploaded. The
preprocessor takes a template (see Section 2.2), which specifies
the general appearance of all documents, and applies that template to each document
that has to be uploaded. Other settings are stored in a variable file (see Section
2.3), which works in the same way as templates do, although
they are concerned about content, not appearance.
The preprocessor takes all documents within a source directory tree and parses
them according to all given options. The output is then stored in the upload
directory where the HTML pages should be uploaded from.
A ``tag'' or ``HTML tag '' is considered to be composed of the
``tag'' itself and its ``options'', i. e.
-
- <tag ([switch]|[option="value"])*>
An ``entity '' is a special notation for some non-printable
characters or string constants, and always has the form .
A string within double quotes can denote a ``variable'' or ``variable name'',
which is, if defined, replaced by another string, the ``content'' of the variable.
The ``template '' is a special file which defines the appearance
of all file that actually appear on the web page. The ``variable file ''
is the file where the values of the variables are stored. The ``file table''
is an intermediate file which is generated in order to be able to dereference
symbolic names later.
The ``documents'' are the HTML pages which actually contain the text that
is later uploaded to the web server. These documents do not have to have fully
complete HTML code, since the template can have much of what is required for
this. The ``source '' or ``pages'' document tree contains
the files that are actually edited (in the document ``root'' path). These
are then processed and stored in the ``output '' or ``upload''
directory. The webmaster has then the upload these documents.
The ``preprocessor '' is the HTML preprocessor which takes
the documents and produces the documents to be uploaded. The ``file table ''
generator is the program which contains all the symbolic names and their meaning;
it has to be run each time a new page is added.
A short text in square brackets usually means a reference to the bibliography,
with the exception of Chapter 4, where it sometimes also denotes
key strokes, with they key in square brackets (e. g. [Return]). The difference
is always obvious.
HTML tags and entities appear in a