Kaye and Geoff's web page documentation 

Introduction

Web pages written using only HTML have limited scope for dynamic behaviour. They can include various types of animated pictures (and sounds) and they allow links to other pages, email and ftp, but that is about all. A much wider range of interaction is possible with the inclusion of Javascript but since it only runs within the browser environment and does not have any general way of reading data or any ability to write to a file, it cannot create or use information which needs to persist between browsing sessions (except for the trivial case of cookies) and which is shared between all people who access the web page.

To achieve this sort of interaction requires a computer program with the ability to read and write files and to interact in a general way with the operating system. The program and its files need to be stored on a computer which is always on and always connected to the internet. In addition, for security reasons the program needs to be accessable to web pages via a controlled interface which only allows the desired information to be passed in either direction. So the obvious place for such programs is the same computer on which your web server is running (or another computer to which it is networked).

Programs designed to work in this way are called Common Gateway Interfaces (CGIs). A CGI is a script or program which runs under the direction of the web server, and typically adds dynamic behaviour to web pages by accessing databases, doing calculations from inputs, selecting files and so on. Users normally provide input data to a CGI using a form written in HTML. The browser and web server are responsible for passing these data to the CGI, which processes them and then passes information back via the server to the browser, telling it what page to display next (often the information is the actual HTML for a web page containing the results of the processing). Note that all CGIs (except those interfacing with the web page via AJAX) must return something - the browser sends the data to the web server which invokes the CGI, and the server expects something in return which it can pass back to the browser.

This document explains the way in which a web page passes data to a CGI and what the CGI might do with that data. There are actual examples of code, but they are generally not suitable (without some modification) to use in real web pages since they have been simplified by leaving out error checking and any sophisticated behaviours. In particular most of the programs do not include any protection against malicious use. Security should always be considered when writing CGIs. Any input field can be given any sort of value by someone trying to compromise your system, so input fields should be limited to those really required, and their values should be validated as tightly as possible. Particular care needs to be taken with values which are to be used in an executable environment, including calls to the operating system, file processing, printing, and so on.

CGIs have to be located in a defined area on the server (traditionally the cgi-bin directory on Unix systems); they cannot just be in your normal HTML page area. If you are an author on someone else's server (for example an ISP) and you want to write CGIs, you should make sure that they allow them. It also helps to ask about access to telnet capabilities, access to the web server error log, and if there are any restrictions (for example limiting access to operating system features) which the ISP applies. It is normally important to know what operating system the web server runs on, since this will limit the choice of languages which are available for you to use for your CGI, and also define the system interactions which your CGI can take advantage of. It should be clear that to create CGIs you not only need programming skills but you also need to have a reasonable understanding of the operating system it will run on. Here we are assuming that this is the case and are only attempting to give you the extra information you need to get you started with writing CGIs.

All our examples presume a Unix server and are written in Perl. Just about all versions of Unix, and the Macintosh (as part of the Unix underlying OSX), come with Perl and it is available for PCs as well. If you have a Mac then it can be very useful to use the local web server to test your CGIs since you do not need to endlessly copy files to a remote server and reset their permissions. Be aware however then there can be some differences between the Mac's Unix and versions commonly found on commercial web servers.

Interaction between a form and the CGI

What is sent from a form to the server?

The browser sends the form's contents to the server as a single string, with each field separated by an ampersand (&). Each field is of the form name=data. The name is the value of the name attribute in the HTML which defines the form. This can be made clear in the following example, where we have created a form as follows:

<form action="http://www.kgweb.org.au/cgi-bin/cgi_string.pl" method="post"> Name: <input type="text" name="Name" value="Fred"><br> E-mail address: <input type="text" name="EmailAdd" value="Fred@domain1.com"><br> Telephone number: <input type="text" name="TelNo" value="9876 5432"><br> <input type="submit" value="Submit the form"> </form>
When the form is submitted, the browser passes the information to the web server, which invokes the perl program cgi_string.pl, which reads the form input as a string from standard input. In this case the string would be (assuming the default values for the form were not changed):
Name=Fred&EmailAddress=Fred@domain1.com&TelNo=9876 5432

It is up to the CGI to read and unpack the information in the string, and handle the information as required.

In fact, the string may not be exactly as shown, since most of the "special characters" (such as &$() or space) are escaped - they are translated by the browser and so appear as different characters, or as hexadecimal numbers preceded by a percent sign. Have a look at the translation table, to see how special characters are translated. The easiest way to illustrate this is by example: complete the fields in the form below and it will (subject to a few security restrictions) send back a copy of the data received on the server. You can try the form a number of times, putting in "special" characters (for example -+&%()[]) to see what happens to them (our tests suggest that the only non-alphanumeric characters less than ASCII code 127 which are not escaped are asterisk, hyphen, period and underscore; the rules say that all other characters should be escaped).

 Name:
E-mail address:
Telephone number:

You can view the Perl script used to pass this information back to the client, and a Perl package which will substitute original characters for the escaped versions.

The preceding example has the method attribute in the form tag set to "post". If the method attribute is "get" then the information is passed to the CGI not via standard input, but as an environment variable called "QUERY_STRING". For example with this form:

<form action="http://www.kgweb.org.au/cgi-bin/cgi_string2.pl" method="get"> My favourite colour is: <input type="text" size="10" name="colour" value="red"> <input type="submit" value="Submit form"> </form>
the CGI would need to read the environment variable called QUERY_STRING to retrieve the passed information. You can try it out with the form below:

Colour:

You can view the Perl script which is invoked by this form.

The environment variable which carries the passed information is called "QUERY_STRING" because there is an alternative way of doing the same thing - by appending a query string to the URL in the action attribute of the form tag, for example:

<form action="http://www.kgweb.org.au/cgi-bin/cgi_string2.pl?red"> <input type="submit" value="Submit the form with a query string"> </form>

Note that the query string cannot be arbitarily long; web servers typically apply a maximum of 1024 characters to this string. Even if a query string limit is not set, there will be a limit on the length of the entire URL. The amount of information allowed to be passed using a "post" request is not unlimited, but is usually much greater (128Kb to 2Gb) than that allowed with "get".

There is yet another way of passing information into an environment variable. If the CGI is invoked with an action such as:

action="http://www.kgweb.org.au/cgi-bin/cgi.pl/save.txt"
then everything within the action parameter after the CGI file name (ie. in this case /save.text) is placed in an environment variable called PATH_INFO. As the name suggests, it was originally envisaged that this mechanism would be used to provide the path to a file (as in our example above) but in practice any string can be passed.

It can be a bit "kludgy" transfering large amounts of information via an environment variable, so generally using method="post" is the preferred approach for passing information to a CGI. However the example below illustrates one simple application where the query string can be very useful - when a CGI is invoked without using a form. Here we want a CGI to be invoked from a pair of menu items, where each item produces a variant of the CGI's output possiblilities.

... <a href="http://www.kg.org/cgi-bin/numbers.pl?even">Show even numbers below ten</a> <a href="http://www.kg.org/cgi-bin/numbers.pl?odd">Show odd numbers below ten</a> ...
View the Perl script and see how it works or try it out:

Environment variables

When a browser 'converses' with a server, it must identify itself, and it may send parameters in the calling string. As a writer of CGIs you have access to the information which the server knows about the browser (for example the browser type and the IP address of its server or proxy). This information is passed in (Unix) environment variables which can be accessed by your CGI. Again, this is easy to illustrate by using a CGI to return the environment variables. You can use the form below (whose only active element is a submit button) to look at the environment variables:

The HTML for the form is straightforward:

<form action="http://www.kgweb.org.au/cgi-bin/env.pl" method="post"> <input type="submit" value="Show the ENV variables"> </form>

and you can view the perl script which is used to pass the environment variable values back to the browser.

More complex CGI examples

The examples given above are fairly trivial, but CGIs can perform a wide range of complex tasks. Unlike static HTML and even Javascript, CGIs can read and write files and have access to the full power of the server operating system. They also offer a level of code and data security not available in Javascript, which is always viewable from the browser.

The following examples illustrate some of the power of CGIs. The first is a script to email the contents of a form.

Email processor

Of course it is possible to just invoke the mail system using a "mailto:" value for the action attribute within a form, for example:

<form action="mailto:bill@ispx.net.au" method="post"> <h3>Suggestion box form</h3> <p> Please enter your suggestion here:<br> <textarea name="suggestion" rows="10" cols="60"></textarea> <p> Enter your email address: <input type="text" name="email" size="24" value=""> <p> <input type="submit" value="Submit the form"> </form>

Browsers vary in how a "mailto" is handled. Some include the message as text in the mail, and some even manage to format it to some extent, but others just send it "raw" with all the escaped characters (as explained above) included. Others respond to the "mailto" by starting up a local copy of a mail program without sending anything immediately. If you want to know how your browser will act under these circumstances, the easiest way to find out is to try it and see, but remember that others using your web pages may have a different browser.

If you want the behaviour of your web page to be predictable under these circumstances, you can pass the contents of the form to a CGI which then processes the information and emails the result to the desired recipient. This allows full control of the process - for example the input fields can be checked to ensure that all required fields are filled in and the information can be reformatted to make it easy to read. More sophisticated processing can also be carried out, such as redirecting the email depending on the contents of the message, sending it to more than one recipient, saving the contents in a database, and so on.

The CGI will be more useful if it exhibits some "general" behaviour so that it can be used with many different forms. The names of the form elements can be used to indicate required fields and the CGI can reject input which does not have these fields filled in. In the following form which uses the cgi_femail CGI the email address is required (indicated by the "req_" prefix to its name). We have also 'hidden' the email address to discourage harvesters (well, unsophisticated ones, anyway) from adding us to their spam lists:

<form action="http://www.ispx.net/cgi-bin/cgi_femail.pl" method="post"> <input type="hidden" name="subject_fm" value="Suggestion form"> <input type="hidden" name="emailto_fm" value="cgipageATispx.net"> <input type="hidden" name="returnURL_fm" value="http://www.ispx.net/freturn.html"> <h3>Suggestion box form</h3> <p> Please enter your suggestion here:<br> <textarea name="suggestion" rows="8" cols="60"></textarea> <p> Enter your email address: <input type="text" name="req_email" size="24" value=""> <p> Enter your phone number: <input type="text" name="phone" size="18" value=""> <p> <input type="submit" value="Submit the form"> </form>
You can have a look at the Perl script which was written as a general forms email processor and try it out with the form below. It includes the email address to which the information is to be sent so that you can enter your own address and see what the email looks like. Note that when you receive the email, the sender will be <nobody> (or www-data) or some other impersonal name. This does not mean that it is spam - it is the name (on our ISP's Unix system) of the owner of the HTTP process which is controlling the mail program, and it is not possible for us to change it. Of course you cannot reply, but then you would not expect to be able to reply to someone who (anonymously unless they provide identifying details separately) fills in a form on a web page. A carefully chosen subject line will help to make the emails from the form easily recognisable.

CGI test form - emailing the form contents

Enter your favourite food:     Enter your lucky number:

Email this form to:    

There are any number of possible improvements which could be made to this simple system. For example the return and error pages are not very pretty; they could be improved, or even combined with the page containing the form (with some query strings and a bit of Javascript). The form fields could be checked before submission with Javascript to save on server-side processing and net traffic. In the same way that some fields are defined as 'required', we could use names beginning with 'num_' to insist that they contain a number (for example the "lucky number" field above), and other similar types of validation could be included.

Guestbook

The web is an excellent way to get feedback on the services and information you offer. One of the features commonly found on web sites is a guestbook, where visitors can register their comments. This is the sort of application where you probably do not want the information emailed to you immediately - it is enough to check out the guestbook from time to time, to see what comments have been made. The example outlined here is rather simple; a more complex (and realistic) version might, for example, check the content for offensive words or attempts to breach security, present a more appealing layout and allow you to archive out-of-date entries. You can try out the example but please do not try to use it to enter active links to your own site - that is not what it is for (and it will not work).

We assume that the guestbook will be made available via a webpage to anyone who wants to look at it, and anyone will be able to contribute comments, so two CGIs are required - one to add an entry, and one to display the existing entries. It is easiest to invoke each one from its own form, but both forms can be on the same web page, for example:

<h3>Guestbook</h3> <form action="http://www.server.net.au/cgi-bin/cgi_read.pl" method="post"> <input type="submit" value="Show me the guest book"> </form> <p> <form action="http://www.server.net.au/cgi-bin/cgi_write.pl" method="post"> Name: <input type="text" name="name" size="40"> <p> Country: <input type="text" name="country" size="40"> <p> Organisation: <input type="text" name="org" size="40"> <p> Please enter your comments: <br> <textarea name="comments" rows="5" cols="60"></textarea> <p> <input type="submit" value="Add your comments to the guestbook"> </form>

Here is the form as it looks on the web page. Note the extra feature: the option of specifying the number of entries to display. You can try it out (but note that entries containing HTML will be ignored).

Example guestbook
entries in the guest book (blank = all)

Name:

Country:

Organisation:

Please enter your comments:

As long as we are happy with a straightforward page layout, the CGI to read the guestbook is very simple, since it can take advantage of Perl's access to Unix system calls. The writing CGI is a bit more complex, but we can keep it reasonably simple by holding the HTML for the guestbook in three files: an unchanging header, a central section containing entries which we add to by appending new entries on the end, and an unchanging footer. In a 'serious' system you might also have one or more private CGIs to delete or archive entries.

Site searching

Many web sites (including ours) allow users to search the site for a keyword or phrase, or a more complex arrangement of words. This facility is a very powerful method of providing information about the site and allowing rapid navigation to the areas of interest. There are a number of ways to implement site searching, but here we will look at the most flexible - writing your own CGI to do the task.

To illustrate the basic requirements, we start with a simple example - to search a single page and display lines containing a given word. The form needs to provide the name of the page to be searched, an input box to accept the word, and a button to submit the form, for example:

<h3>Search for a word</h3> <form action="http://www.kg.org/cgi-bin/search1.pl" method="post"> <input type="hidden" name="file" value="/home/kg/www/index.html"> Enter word to look for: <input type="text" name="word" size="20"> &nbsp;&nbsp;&nbsp; <input type="submit" value="Do the search"> </form>

The CGI which the form invokes is written in Perl. You can try it here:

Enter word to look for:    

This search is not in fact very useful - it only (rather poorly) duplicates a feature found in most browsers. So our second example is more complex - in fact it is very close to the script we use to implement searching on our site. It is more useful in that it returns a page of active links to pages containing the search word. The word can be a phrase (it can contain spaces) and the search can be limited to a subset of pages - this makes sense with a site like ours which is comprised of a number of more-or-less unrelated sub-sites.

There is no need to provide a working example of this search - it is at the top of each of our major pages. The HTML for the form looks like this:

<form action="http://www.kg.org/cgi-bin/websearch.pl" method="post"> <input type="hidden" name="pathn" value="0"> <input type="submit" value="Search"> the HTML pages <br> for the word or phrase <br> <input type="text" name="sstring" size="24" value=""> </form>
Again, the actual searching is carried out by a Perl CGI which perfoms some additional tasks compared to the simple script used in the first search version. It must find all the web pages (ie. HTML files) and when it gets a match must build a URL for the matched page. Our script also records details of each search that it is asked to perform (we have summarised the results on our web search analysis page).

Because our search feature only has a small space at the top of the page, it does not allow sophisticated search rules - whatever is entered is what is searched for (multiple words separated by spaces are treated as a phrase rather than separate key words). Also, it is not suitable as a general searching script - it expects our particular site structure and names, although it could be modified to work with a different structure. You can try it out by going to any of the major pages, for example the home page.

Problems: creating, testing and debugging your code

The complexity of CGIs and the interactions between them and web pages means that getting it all working is much more challenging than writing web pages in HTML.

If you want to create CGI scripts, you may need to talk to the webmaster who looks after your server first. Some ISPs do not allow user-written CGIs, and even if they do, they may want to closely examine anything you do before it is allowed on their server. This is because CGIs run under the control of the web server, which usually has more privileges than normal users are allowed, so there will always be system security concerns with CGIs.

For example, one precaution typically applied is illustrated in our resub package which removes any backquote characters - under some conditions these can be used to invoke Unix commands from the information passed to the CGI. A related rule is never to use passed parameters within backquotes in Perl programs. The vertical bar and angle brackets are other characters in Unix which can be used to induce undesirable behaviour so if possible these characters should be screened out of your input. Where possible, pass codes to select parameters used inside the CGI rather than the parametrs themselves; this simplifies and tightens up range-checking of the input.

You might also like to investigate Perl's 'taint' mode. There are many more security precautions which you need to consider when creating CGIs; for example have a look at Randal L. Schwartz's Unix Review Column 48.

Detailed instruction on programming in Perl or other languages is well beyond the scope of these pages, but here are some general hints on writing and debugging your CGI scripts:

  • Be aware of case sensitivity - Unix and Perl differentiate between upper and lower case in variables, commands, filenames, etc.

  • If you develop your CGIs on a PC or Mac and then upload them to a Unix system, remember that all three systems may use different line terminators

  • If your script does not run as you expected, there could be a syntax error or a logical error in the code. Errors are normally reported in the web server error log, so access to this file is very useful in debugging CGIs. The Unix "tail" command will show you the end of the log when your CGI fails.

  • If you get "Malformed header" messages, it means that the HTML browser did not get a correct "Content-type: ...." message setting the MIME type for this page. Check your code to make sure you are sending the MIME type (correctly formatted) as the very first line. The line can look OK but have hidden non-printing characters - try deleting it and completely retyping it. Also check that you have followed it with a blank line - always required to terminate the HTTP header block.

  • If your CGI reads a file and sends its contents back as part of creating a web page, do not print the "Content-type: ...." line from the CGI and then use a system call to access the file. The CGI may action the print asynchonously which can result in the file contents being sent before the MIME type. It is a good idea to create files that you want to send back to the client with the MIME type embedded, to ensure that this problem cannot occur.

  • If your script is manipulating data, formatting it into HTML code on-the-fly, you can often create a static file for the "head" of the page and another such file for the "foot" of the page, which limits the typically rather messy writing of HTML to the middle section containing the data. An example of this is the guest-book, where the header and footer files are used to sandwich the dynamic table entries. This allows for simple updating of the data without having to re-write the parts of the file that don't change.

  • On Unix servers you need to set the appropriate file access permissions. The CGI itself must be executable. The permissions of any file which the CGI accesses must be set from the point of view of the web server, which will invoke the CGI. Depending on the location of the file, this may mean that if you read a file the world needs read access and if you write to it then the world needs write access. This has obvious security implications.
Top
Previous
Next
Index
Home