form.py 1.3 [final]

Contents

  1. Introduction
    1. form vs cgi
    2. Example
  2. Using fdefs
    1. Datatypes
      1. form.STRING
      2. form.TEXT
      3. form.ENUM
      4. form.BOOL
      5. form.LIST
      6. form.MAP
      7. form.FILE
      8. form.INT
      9. form.FLOAT
    2. Embedding values in names
  3. Functions
    1. Initialisation
      1. version
      2. memorylimit
      3. filelimit
      4. listlimit
      5. europe
    2. Reading submissions
      1. readForm
      2. readUrlEncoded
      3. readUrlEncodedStream
      4. readFormData
      5. readFormDataStream
    3. Writing submitted values back
      1. writeForm
      2. writeFormStream
      3. writeUrlEncoded
      4. writeUrlEncodedStream
      5. writeFormData
      6. writeFormDataStream
    4. String coding
      1. encH
      2. encU
      3. encI
      4. encJ
      5. encHJ
      6. decU
      7. decI
    5. Utility functions
      1. checked
      2. selected
      3. randomSafeString
      4. makeSafe
      5. makeSafeIsh
  4. Exceptions
    1. cgiError
    2. fdefError
    3. httpError
  5. About
    1. History
    2. Licence

1. Introduction

The form module is an extended replacement for the standard Python cgi module, providing robust form-handling features in order to make writing secure form-handling CGIs in Python less work.

The idea is to define the kind of data you want returned for each field of the form. This definition is done using a mapping of form field names to datatypes (fdefs), which is passed to the main function, readForm. This call reads CGI input and interprets it, returning a mapping of field names to values.

form also fully supports [multiple] file-upload fields, image-submit fields and embedding values in names, protects against some denial-of-service problems common to CGI scripting, and provides miscellaneous utility functions useful to CGI progammers. It has been proven to cope with very large input sets.

1.1. form vs cgi

form and cgi have completely different interfaces and are not compatible. form works at a somewhat higher level than cgi. Its ease of use comes at the expense of disallowing direct access to the exact submitted data.

The main advantage is that the returned values from reading a form submission are guaranteed to conform to your specifications, regardless of how malformed the submission may have been. This reduces the error-checking necessary to produce error-free scripts. The abstraction of datatype from submission data also allows some elements in an HTML form to be changed without having to re-write the corresponding CGI.

cgi is part of the standard distribution and so guaranteed available without having to add any modules. It easily suffices for writing simple forms. form is more complicated that cgi so it may be more likely to have bugs in it, although none are currently known. form is also not suitable for applications where you don't know the names of the submitted fields in advance (eg. generic form-to-mail scripts).

1.2. Example

A user sign-up form might be read like this:

import form

fdefs= {
  'email': (form.STRING, 128),
  'username': (form.STRING, 16),
  'password': (form.STRING, 16),
  'sex': (form.ENUM, ['m', 'f'], 'f'),
  'age': form.INT,
  'sendmespam': form.BOOL
}
fvals= form.readForm(fdefs)

if fvals.username=='':
  errorPage('You forgot to enter a user name.')
if allUsers.has_key(fvals.username):
  errorPage('Sorry, someone has already had that user name')

# and so on

2. Using fdefs

Each item in an fdefs dictionary defines one form field. The key should be the same as the name property in the HTML form, which should not normally contain a period or colon (see 2.2). The value of the item dictates the datatype to be returned.

readForm returns a dictionary-like object with the names of the fields as keys. The type of the values depends on which type was requested for that field in the fdefs. You can read the returned object like a dictionary (fvals['address']), or like an object (fvals.address), it makes no difference.

In the case where a field is included more than once in a submission but a list-of-values submission (form.LIST) was not expected, the last field in the input takes precedence.

2.1. Datatypes

The following field types are available. Some of them take parameters, which you can specify by putting the type in a tuple, with the parameters following. If you are not passing parameters, you can use the type name on its own or in a singleton tuple, it doesn't matter which.

(form.STRING, length, exclude)

For input type=text or password. Return a string of maximum length length characters, with all characters in the string exclude removed. You can omit the exclude string to allow all (non-control) characters. You can omit length or set it to 0 to allow any length string; it's mostly there so you can copy the value into a database without having to worry about it being too big to fit.

(form.TEXT, length)

For textarea. As form.STRING, but single newlines are converted to space, and double newlines are converted to a Python '\n'. Other control characters are still removed.

(form.ENUM, [value, value, ...], default)

For select and input type="radio". Return one of the list of string values passed if it matches the input, else return the default value, which can be of any type. If the default is not supplied, '' is used as the default.

form.BOOL

For input type="checkbox" with no value property. The value returned is a boolean object which evaluates true if the input value for this field was 'on', else false.

form.LIST

For select multiple and multiple fields with the same name (especially checkboxes). Return a list of each non-empty input strings given for this field.

(form.MAP, (width, height))

For input type="image". Return a tuple (x, y) of position of the click, clipped to within (0:width, 0:height) if the (width, height) tuple is supplied. Returns (0, 0) if the input field was supplied but without x and y co-ords, or (-1, -1) if the field was not in the input at all.

(form.FILE, directory)

For input type="file". Fills the given directory with files uploaded through the field, and return a list of tuples (storedFile, suppliedFile, mimeType, length). The suppliedFile filename may be '' if no filename was specified. storedFile is the full pathname of the stored file. The list is empty if no files were uploaded, and is unlikely to be longer than one entry since few browsers support multiple-file upload.

(form.INT, default)

Parse the input as an decimal (possibly negative) integer. Returns the default value if no parsable number could be read. If default is omitted, zero is used as the default. Returns sys.maxint if the number is higher than Python can represent as an integer. Note! Future versions of form may return a long integer for form.INT. I might restrict this to Python 1.6 and later, where str doesn't add an 'L' to the end of the number, to avoid problems.

(form.FLOAT, default)

Parse the input as a simple floating point number, which may contain a decimal point, but not 'E' notation. Returns 0.0, or, if supplied, the default if the input is not a valid number or not supplied.

2.2. Embedding values in names

In HTML, there are some kinds of form fields where you can't use the value attribute to pass information to the CGI script. These are input type="map", where the value is always a pair of co-ordinates, and submit, where the value is used as the text for the button.

So if you wanted to detect which of a set of identically-labelled buttons was pressed, you'd have to give them all a different name, and include a check for each one in your script. This would be especially tedious for an order form with a hundred "Buy It!" buttons, for example.

For this reason, form allows you make a group of controls where the value submitted for each is taken from the name of the control instead of the value, when such a control is included in a submission. The actual value submitted is ignored.

To use the feature, put both the name and the desired value together in the HTML name of the field, separated by a colon. (Which is a valid character for name, albeit a seldom-used one).

<input type="submit" name="b:left" value="Click me!">
<input type="submit" name="b:middle" value="Click me!">
<input type="submit" name="b:right" value="Click me!">

In this example, an call to form.readForm({'b': form.STRING}) would return either 'left', 'middle' or 'right', depending on which button was used to submit the form. This is not limited to STRING: values of all types except FILE may be embedded in names.

(You can still use names with colons if you do not wish to use the value-embedding feature. form only tries to separate a name with a colon in if it can't find the whole name as a key in your fdefs. The same goes for periods, which are special characters used by HTML in image maps.)

To embed characters which aren't normally allowed in HTML name attributes, see the encI function. form will automatically decode this for you when reading name-embedding values.

Functions

Initialisation

initialise(version, memorylimit, filelimit, listlimit, europe)

Calling this function is not compulsory, but it allows you to set some of form's internal variables easily.

form.py includes features to protect against certain kinds of denial-of-service attacks in POST requests. They are turned off by default, but passing non-zero values in the "limit" parameters enables them.

The arguments you can set are:

version
The lowest version of form your script is happy running with. If you request a newer version than the module, an exception will be raised.
memorylimit
Guards against a request containing parts that are enormous, filling available memory and causing the web server thread to swap like crazy. Nothing will be stored in memory that is larger than this value in bytes; longer values will not be truncated but simply skipped; if some headers grow larger than this, no values will be parsable at all.
filelimit
Guards against a request including an enormous file upload, filling available disc space. Files will be truncated at this number of bytes.
listlimit
Guards against a request including the same input field over and over again, filling memory, or the same file upload field repeatedly, filling disc space. Fields of type LIST or FILE can then contain no more than this number of entries.
europe
If true, form.INT and form.FLOAT will read numbers using European-style punctuation (where "." is a thousands-separator and "," is the decimal point). If false (the default), it's the other way around.

Reading form-based data

All read functions take submitted form data and parse it, returning a dictionary-like object containing the values that have been posted to the form, standardised according to the fdefs argument passed to the function. The returned object may be read like a dictionary or like an object.

Typically, a script calls readForm at the start of its code. Scripts do not normally need to call the other read functions directly.

readForm(fdefs)
This function is normally used to read form data. It works out which of the other read functions is appropriate, and calls that.
readUrlEncoded(fdefs, query)
Reads a query string (without leading "?") passed directly to the function. form understands ';' separators as well as '&'.
readUrlEncodedStream(fdefs, stream, length)
Same as readUrlEncoded, but takes its input from a stream object (must support read()) instead of a string.
readFormData(fdefs, data, parameters)

Decodes fields encoded in a multipart/form-data formatted string. parameters is a dictionary of MIME headers, lower-cased keys, containing at least a 'boundary' key.

Currently this function is no more efficient than readFormDataStream, since it is not commonly needed.

readFormDataStream(fdefs, stream, length, parameters)
As readFormData, but input is taken from a stream object instead of a string. The length is the number of bytes that should be read from the stream.

Writing submitted values back

The write functions take form values from a dictionary (or dictionary-like object returned by the read functions), and convert them into encoded text sent to a string or a stream.

File upload fields only work for writeFormData and writeFormDataStream since it does not make much sense to try to upload a file to a query string or hidden form. File upload values need not have a valid length value in the tuple as the length is read directly from the file specified.

Currently, the string-returning functions are no more efficient than the stream-writing versions.

writeForm(fvals)
Returns a string containing HTML input type="hidden" controls for each field in the fvals dictionary. This is useful for writing a follow-up-form that retains all the information posted into a previous form.
writeFormStream(fvals, stream)
As writeForm, but send output to a stream object (or anything supporting write) instead of returning a string.
writeUrlEncoded(fvals)
Return a &-separated list of URL-encoded key=value pairs representing the values. The query-string separator '?' is not included in the returned string. If you're including the query string in, for example, an <a href="...">, remember to HTML-encode the whole URL, or those & characters could confuse a browser.
writeUrlEncodedStream(fvals, stream)
As writeUrlEncoded, except that the output is sent to the nominated stream.
writeFormData(fvals)
Return a MIME multipart/form-data message from the given values. form will work out a suitable boundary value for you.
writeFormDataStream(fvals, stream)
Oh, does exactly what it says on the tin.

String coding

These convenience functions are available for coding text for representation in HTML, URLs and JavaScript strings. If you have user input anywhere in your scripts, you'll need to do this a lot, or you're likely to make a site susceptible to security problems. (See this CERT advisory for an example of this.)

encH(text)

Encode text as HTML and return as string. ", &, <, > and control characters are replaced with HTML entities. This assumes you use the double-quote rather than single-quote for attribute strings, which is advisable. Obviously quotes do not need to be escaped outside of attribute values, but it does no harm.

encU(text)

Encode text as a URL part (replacing spaces with '+' and many symbols with %-encoded entities), and return as a string.

Note: you should not pass entire URLs through encU, only separate parts, for example a directory name in a path, or a key or value string in a query. Once encoded you can combine these parts using '/', '?' and so on. When writing HTML, remember to encode the complete URL if it has characters like '&' in.

encI(text)

Encode text so it can be included in HTML id or name attributes. This is especially useful when you need to include arbitrary strings in name-embedded values.

This encoding is not a web standard, it's specific to form. Technically it simply replaces all disallowed characters with ':xx' where xx is the hex encoding of the character.

encJ(text)

Encode text suitable for inclusion in a JavaScript string. This escapes single and double quotation marks, and the ETAGO (</) marker, making the result safe to include in a string in a script block.

encHJ(text)

Shorthand for encH(encJ(text)), useful for writing Javascript inside of an HTML attribute, especially event handlers.

decU(urlPart)

Decodes text encoded into part of a URL, replacing the %-encoded entitites into plain text.

decI(urlPart)

Decodes text escaped with encI.

CGI utility functions

These functions are of general use to CGI scripts and are provided together as a convenience, as well as being used internally by form.

checked(condition)

Simply returns the string ' checked' if the condition is true or '' if false. This often saves writing an if statement whilst outputting a form.

Example

print '<input type="checkbox" name="spam"'+form.checked(f.sendmespam)+'>'
selected(condition)

Like checked, but on true outputs ' selected', for select fields.

Example

print '<option value="m"'+form.selected(f.sex=='m')+'>'
randomSafeString(length)
Returns a pseudo-randomly-generated string of a given length, built only from letters, numbers and underscores.
makeSafe(text)
Filters everything but ASCII letters, numbers and underscores from a string, and adds an underscore if the string is empty. The resulting string should be safe to use as a filename.
makeSafeIsh(text)
As makeSafe, but allow single periods and slashes, but not combinations of them together or string starting with them. Also allows the range of characters between C0-FF, used for accented letters in ISO-Latin encodings.

Exceptions

The input-reading functions may throw the following exceptions:

cgiError

Some aspect of the CGI environment is broken, for example environment variables not being correctly set by the script's caller.

cgiErrors are the fault of the web server, and should not happen in working web sites.

fdefError

An fdefs dictionary was passed to readForm which included unknown fdef values or unexpected parameters. Alternatively you passed a set of fields to writeForm or writeUrlEncoded (or the stream versions) which included a file-upload field. Note, readForm may also raise a TypeError, if some of the parameters in the fdefs were of the wrong type.

fdefErrors are your script's fault, and should not happen in working web sites.

httpError

The HTTP request or the MIME message in a HTTP POST request is malformed in some way.

httpErrors are the user-agent's fault, so could happen in a working web site, but only if either:

  1. the user's web browser is badly bugged, or
  2. someone is deliberately sending your script odd input to confuse it.

Finally, initialise may throw a NotImplementedError if it is called with a version number higher than the version of form being used.

About

form was written by Andrew Clover and is available under the GNU General Public Licence. There is no warranty. However it has been in use on several production systems without apparent trouble.

Bugs, queries, comments to: and@doxdesk.com.

History

0.1 [dev] (6 January 2000)
First apparently working version.
0.2 [dev] (27 January 2000)
form.NUMBER becomes form.FLOAT; form.INT added form.BOOLEAN changed to a class of its own, to distinguish it from form.INT. To support European number formatting conventions, added built-in functions to replace int() and float(), controlled by form.sepChars and form.decChars.
0.3 [dogfood] (2 February 2000)
Fixed bug in assigning default values (mutables confusion). form.MAP non-submission value changed to (-1, -1)
0.4 [dogfood] (6 March 2000)
form.INT now clips when the number goes above maxint instead of throwing an exception. Not sure whether this is good behaviour but it follows the idea of not throwing exceptions due to bad user input. form.BOOL class replaces old BOOLEAN kludge. form.writeUrlEncoded[Stream] no longer prepends a '?'.
0.5 [dogfood] (31 March 2000)
Removed embarrassingly bad stream parsing bugs. Fixed ENUM so that '' can be a non-default value
0.6 [dogfood] (11 June 2000)
Added name-encoding system. Cleaned up image map detection. Added initialise call to avoid having to access limits and other module variables manually, and to allow me to make more interface changes like those in version 0.4 without breaking backwards compatibility.
0.7 [beta] (15 June 2000)
Fixed bug in multipart parsing affecting multiple file uploads. All major features have now been tested, so I'm taking form.py to beta.
0.8 [beta] (13 September 2000)
Added optional default to INT and FLOAT types. Added EitherMapping object replacing plain dictionaries to allow slightly cleaner-looking access to form values. Hopefully this will not cause any incompatibilities.
0.9 [beta] (9 November 2000)
Safeish strings may now not begin with '/' or '.'. Code comments wrapped to 80 columns. Documentation finally brought up-to-date.
1.0 [final] (7 December 2000)
Added trivial encJ and encHJ functions. Cleaned up EitherMapping so it's safe to use in other scripts.
1.1 [beta] (21 December 2000)
Added encI and decI functions, made decI happen automatically on name:value separation. Removed pointless encHU call from documentation. Disallowed the remaining top-bit-set characters from makeSafe strings, in case somehow they lead to the Unicode-parsing security breaches that turned up in IIS. They're still allowed in makeSafeIsh though.
1.2 [final] (29 January 2001)
EitherMapping now allows entries to be removed using del.
1.3 [final] (11 April 2002)
encH, encU and encI made more strict about what things they escape, to expand their usefulness a bit. '+'-encoding in encU fixed (can't believe I let that slip through after deliberately remembering to get it right). encH no longer attempts to HTML-encode top-bit-set characters, so they are left in whatever character set the document is declared as rather than becoming references to the Unicode characters they might not be. This is as a prelude to proper Unicode support coming in the next release.

Licence

Copyright © 2000 Andrew Clover. Released under the GNU General Public License.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.