Kaye and Geoff's web page documentation 

Introduction

This is the page which collects together bits of HTML and a few miscellaneous subjects which have not been mentioned or covered in enough detail elsewhere. Many of the HTML tags described here are used to achieve formatting effects and so may be officially frowned on in favour of the use of style sheets, but we have some reservations about CSS. Not only are style sheets often implemented inconsistently and incompletely, but in many cases in-line tags are just easier to use and interpret.

This motley crew begins with a tag which has been mentioned in passing but not discussed in any detail...

The <font> tag

Text enclosed within <font>...</font> tags is modified as specified by the tag's attributes. These may be any combination of:

  • size="n" where n is an integer in the range 1..7 or a digit preceeded by a plus or minus sign. The first type of value sets the size of the text from the smallest (1) to the largest (7). The second increases or decreases the text size by the amount indicated, relative to the standard size specified by the browser.

  • color="#hhhhhh" where hhhhhh is a six-digit hexadecimal number which is interpreted as three pairs of two digit numbers, indicating the "amount" of red, green and blue respectively which are combined (additively) to create the colour. There are also a series of common colour names which can be used in place of the #hhhhhh.

  • face="f,f,f..." where f is the name of a font face or font family. Note that this attribute has never been a part of any standard but is widely supported and widely used.

The <font> tag must be used with care. Setting the text size to 1 can make it so small that some fonts on some computers "break up" because not enough pixels are allocated to completely draw every character. Using the <small>...</small> tag is much safer. When setting text colours make sure that the background does not "hide" the text because it is a similar shade or brightness. And avoid red text on green backgrounds or vice versa - nearly 10% of the world's males suffer from red-green colour blindness (and it usually looks awful anyway). Remember that you cannot know what fonts are available on any computer displaying your page, so you must not rely on the "face" attribute for any important effect. Browsers typically allow the viewer to override "face" and other font attributes, and if the browser cannot find any of the specified fonts then it uses its default.

Here are some examples showing the effect of the <font> tag with your current browser settings:

<font color="#0000dd">normal size blue text</font> normal size blue text
<font size="7">Biggest text</font>   Biggest text
<font size="1">smallest text</font>   smallest text
<font size="+1">one size up text</font>   one size up text
<font face="comic sans MS">Comic sans MS</font>   Comic sans MS
<font color="white">white text</font>   white text

There is also a <basefont> tag, which is used without a terminating tag. It sets the default font size of the text on the page and from which relative font sizes (using the size="+n" or size="-n" attribute in a <font> tag) are calculated. For example:

<basefont size="4">

This tag should not be used. It is preferable to leave viewers to fix the basic size of text they prefer in their browser settings; your page design should be flexible enough to accommodate the resultant variations in text size. For this reason <basefont> is not in modern versions of standard HTML and some browsers ignore it (which provides an even better reason for not using it).

More text characteristics

There are a series of tags which (like bold and underline, which we have already discussed) change their enclosed text in a defined way. They do not need any discussion; the effect can be appreciated from the following examples:

<tt>teletype or monospaced text</tt> teletype or monospaced text
<i>italic text style</i>   italic text style
<strike>strike-through text style</strike>   strike-through text style
<big>text in a large font</big>   text in a large font
<small>text in a small font </small>   text in a small font
xxx<sub>text in subscript style</sub>   xxxtext in subscript style
xxx<sup>text in superscript style</sup>   yyytext in superscript style
<em>basic emphasis</em>   basic emphasis
<strong>strong emphasis</strong>   strong emphasis

<address>...</address> and <cite>...</cite> are designed to display addresses and citations, respectively. For example:

<address>27 Echidna Drive<br> Brisbane<br>Australia</address>
27 Echidna Drive
Brisbane
Australia
<cite>Platypus, Bill (1902) "Egglaying"</cite>   Platypus, Bill (1902) "Egglaying"

You may feel that you can format addresses and citations more appropriately than relying on the default behaviour of whichever browsers are used to display your web pages.

The <br> and <p> tags revisited

The break and paragraph tags appear to be quite similar; the only difference in their behaviour might seem to be that the <p> inserts a blank line, and the <br> does not. They are so simple that they are among the first tags to be described in our documentation. But there is a subtle difference in how they behave. The <br> tag always forces a line break, so that if you place a sequence of them in your HTML then you will get a sequence of blank lines (just as you would expect).

However, if you enter an unbroken sequence of <p> tags, you do not get a sequence of blank lines; you get just one normal paragraph break. In other words, the browser treats a series of <p> tags as though there is only one, echoing the way that white space is handled. The same rule is applied wherever the browser acts as though a paragraph break exists (for example immediately before and after <blockquote>...</blockquote> and <form>...</form> tags - although not all browsers treat these "block-level" elements in an identical way). So...

aaa <p> <table>xxx</table> <p> bbb

is displayed exactly the same as...

aaa <table>xxx</table> bbb

The <br> tag normally applies just to the text which preceeds and follows it, but it can be extended with the clear attribute to consider adjacent images. This attribute can have a value of "left", "right" or "all", for example:

<img src="smiley.gif" align="right" border="0"> The quick brown fox <br clear="all"> jumped over the lazy dog

The effect is for the text after the break ("jumped over the lazy dog") to start below whichever is the lowest of the "The quick brown fox" text or the image, like this:

The quick brown fox
jumped over the lazy dog

Block definition tags

<span>...</span> and <div>...</div> are tags which, on their own, do nothing. The difference between them is that <span> does nothing in-line whereas <div> does nothing for a block (ie. between new lines). You might wonder at the usefulness of such tags, but in fact they have two uses. The first is to define an arbitary block of text so that it can be used with DOM (document object model) functions and CSS. Manipulating the DOM generally requires a good knowledge of Javascript, so we will not consider it here. The second use depends on the inclusion of attributes with the tags. The most useful is the align attribute with the <div>...</div> tag which can take the same values ("left", "right" and "center") as we have seen in table cells and elsewhere; for example the following HTML has the same effect as the <center> tag:

<div align="center">Block of text</div>

Inclusions

We have already seen that images are held in separate files which are included in web pages with the <img> tag. Most browsers, usually with the help of plug-ins, can run programs written in the Java language. These programs are also held in external files, and can be invoked with the <applet>...</applet> tag. Music or other sounds can also be included in web pages, although different browsers use different ways to achieve this. Even other HTML documents can be included with yet another specialist tag - the <iframe>...</iframe> tag specifies an in-line frame to hold a web page in a window which is inserted into the current page in a very similar way to images.

It became obvious to those responsible for setting HTML standards that in the future web pages might be required to handle even more forms of multimedia (maybe some not even invented yet), and that a uniform way of handling all inclusions from external files would be a Very Good Idea. So they came up with the generic <object>..</object> tag. Universal support for this tag for all multimedia has been slow in coming, but will presumably eventually be a reality. The use of the <object> tag can be illustrated by showing how it can be used as an alternative for the <img> tag:

<object data="numbat.png" type="image/png"> This is a closeup of a numbat </object>

Notice that a combination of a data and a type attribute informs the browser where to find the external file and what type of file to expect, and thereby what to do with it. HTML already defines quite a few "types", all with the two-words-separated-by-a-slash format as shown in this example, and to handle a new multimedia format in the future we just need to give it a new unique type. These types, also called "mime types" or "content types", are also used in other contexts within HTML, giving this approach even more universality. The text between the tags is a description of the object - the equivalent of the alt value in an <img> tag.

See if your browser can successfully deal with an image defined in this way. Do you see a green smiley? This is a smiley

Controlling robots - the robots.txt file

The name 'robots' here refers to 'spiders' or 'crawlers' or other similar programs which automatically trawl the web by downloading pages and following the links they find on them. The most obvious examples are search engines looking for pages to add to their indexes, but crawlers can also be used to collect email addresses to be used by spammers. Sometimes you would prefer that some of your pages were not indexed; they might be under development or temporary or not aimed at a general audience. You can ask web trawling robots to ignore these files by placing an appropriate entry in a robots.txt file.

This is a simple text file, normally placed in the same directory as your index.html or equivalent file. If the files you did not want robots to see were all in a sub-directory called 'private' (it helps to collect them all into one or a limited number of directories) then the contents of the robots.txt file would look like this:

User-agent: *
Disallow: /private/

A user-agent of '*' means all robots and the second line is clear enough - it says that I do not allow you into the directory called 'private'. Of course, just like the related robots metatag (see The Head), you cannot force robots to obey this directive, but most legitimate search engines probably do (and spammers don't). If you want serious control and security then you need (on Unix servers) to investigate the .htaccess and .htpasswd files, but we will not deal with them here.

Robots.txt files do not allow a great deal more sophistication than illustrated in the example above, but to get the complete syntax you can try the following sites:

Dynamic HTML

The name "Dynamic HTML" (or DHTML) is misleading since it does not refer to a modified HTML, rather it is a blanket term that covers HTML-aware technologies designed to give web pages more dynamic behaviour, improving interaction with the viewer. These enhancements are achieved by combining techniques and concepts such as Javascript and the document object model (DOM).

This area presents a challenge. It provides the opportunity to create interesting and dynamic web pages but sometimes at the expense of universal accessability. The effect of many DHTML techniques differs from machine to machine, depending on the browser and host combination used. Some techniques are yet to be accepted by the official standards authorities, and others are restricted to one browser or platform. If you decide to use dynamic HTML techniques on your pages, you need to make sure that your target audience can take advantage of the enhancements. As time progresses and standards evolve DHTML should become more mainstream. In general using dynamic HTML involves programming and so is considerably more difficult to implement than straightforward HTML.

Here are some concepts and technologies which contribute to DHTML:

The document object model describes a heirarchical data structure which contains every single element in the web page, allowing, after the page has been rendered, new elements to be added, existing elements to be deleted or even swapped around within the structure. The effects are seen immediately on the page. This manipulation is carried out by routines which are part of an updated Javascript. There are several levels of DOM. Level zero is the ad-hoc set of routines and properties which existed before the first W3C standard was proposed. DOM level 0 was implemented differently in different browsers, with only a limited subset of routines in common. To address this problem, around 1999 the W3C proposed the level 1 standard which has been adopted reasonably uniformly by most of the common browsers, although one major area of difference is in how events are processed. The latest (2007) standard is level 3 but it will take a while before we can expect most browsers to support this version. For reference information on the DOM try Javascript Kit or W3C's Document Object Model (Core) Level 1

Javascript is a programming language whose code can be (in fact almost always is) embedded within the HTML of a web page. It has no particular relationship with Java, which is a separate language. Javascript is interpreted by the browser, so it is run on the "client" computer rather than the server. Despite this it generates no serious security concerns since it is incapable of reading or writing to the local disk (with the exception of cookies, which is a very controlled process), or accessing any operating system parameters other than a limited set provided by the browser.

AJAX (Asynchronous JavaScript And XML) is a series of Javascript routines which are designed to pass information between the web page and the web server. They use hypertext transfer protocol (HTTP) which is how the server normally communicates with the browser. Returned information can be used to update part of the page without requiring the whole page to be reloaded and re-rendered. Combined with the features provided by the DOM, this is a very powerful technique to add very flexible behaviour to web pages. Ajax has been available on most browsers for quite a few years now and so is probably "safe" to use.

Java is a highly portable object-oriented language similar to C++. It was devised by Sun Microsystems, and allows programmers to produce compiled (ie. efficient) code that will run on any computer with Java support. The major web browsers all support Java applets, allowing programs to be downloaded by and run under the control of the web browser. Unlike Javascript, Java has the ability to read and write local files, which in the past led to security problems with Java applets on the web. Later versions of Java seem to have fixed these bugs, but browsers have the ability to disable Java, and many people worried about security take advantage of this feature. As a result they cannot view any page which relies on Java applets.

JSP (Java Server Page) is a web page that contains java scripts that have to be interpreted by the server delivering the page, before it is downloaded to the browser. In other words, it contains Java, but that code is used by the server, not the browser. This is potentially safer than allowing the code to be run by the browser.

ASP (Active Server Pages) is Microsoft's version of JSP, where VBScript code (ie. VisualBasic rather than Java) is embedded in the web page. The code is interpreted by Microsoft's script interpreter on the server before the result is delivered to the browser. This makes the pages dependant on having a Microsoft-compatible server, but despite this ASP are quite widely used. More recent (ASP.NET) implementations allow the code to be separated from the HTML, which means it can be compiled and so will run much faster. This reflects the way that CGIs are set up on Unix servers.

ActiveX only works on PC versions of Internet Explorer - not Mac versions and not with other browsers. In our opinion this is enough reason not to use it.

Top
Previous
Next
Index
Home