The form module is an extended replacement for the standard Python cgi module, providing robust form-handling features in order to make writing secure form-handling CGIs in Python less work.
The idea is to define the kind of data you want returned for each field of the form. This definition
is done using a mapping of form field names to datatypes (fdefs), which is
passed to the main function, readForm
. This call reads
CGI input and interprets it, returning a mapping of field names to values.
form also fully supports [multiple] file-upload fields, image-submit fields and embedding values in names, protects against some denial-of-service problems common to CGI scripting, and provides miscellaneous utility functions useful to CGI progammers. It has been proven to cope with very large input sets.
form and cgi have completely different interfaces and are not compatible. form works at a somewhat higher level than cgi. Its ease of use comes at the expense of disallowing direct access to the exact submitted data.
The main advantage is that the returned values from reading a form submission are guaranteed to conform to your specifications, regardless of how malformed the submission may have been. This reduces the error-checking necessary to produce error-free scripts. The abstraction of datatype from submission data also allows some elements in an HTML form to be changed without having to re-write the corresponding CGI.
cgi is part of the standard distribution and so guaranteed available without having to add any modules. It easily suffices for writing simple forms. form is more complicated that cgi so it may be more likely to have bugs in it, although none are currently known. form is also not suitable for applications where you don't know the names of the submitted fields in advance (eg. generic form-to-mail scripts).
A user sign-up form might be read like this:
import form fdefs= { 'email': (form.STRING, 128), 'username': (form.STRING, 16), 'password': (form.STRING, 16), 'sex': (form.ENUM, ['m', 'f'], 'f'), 'age': form.INT, 'sendmespam': form.BOOL } fvals= form.readForm(fdefs) if fvals.username=='': errorPage('You forgot to enter a user name.') if allUsers.has_key(fvals.username): errorPage('Sorry, someone has already had that user name') # and so on
Each item in an fdefs dictionary defines one form field. The key should be the
same as the name
property in the HTML
form, which should not normally contain a period or colon (see 2.2).
The value of the item dictates the datatype to be returned.
readForm
returns a dictionary-like object with the names
of the fields as keys. The type of the values depends on which type was requested
for that field in the fdefs. You can read the returned object like a dictionary
(fvals['address']
), or like an object
(fvals.address
), it makes no difference.
In the case where a field is included more than once in a submission but a list-of-values submission
(form.LIST
) was not expected, the last field in the input takes
precedence.
The following field types are available. Some of them take parameters, which you can specify by putting the type in a tuple, with the parameters following. If you are not passing parameters, you can use the type name on its own or in a singleton tuple, it doesn't matter which.
For input type=text
or password
. Return
a string of maximum length length characters, with all
characters in the string exclude removed. You can
omit the exclude string to allow all (non-control) characters. You
can omit length or set it to 0 to allow any length string; it's
mostly there so you can copy the value into a database without having to
worry about it being too big to fit.
For textarea
. As form.STRING
, but single newlines are converted
to space, and double newlines are converted to a Python '\n'. Other control
characters are still removed.
For select
and input type="radio"
.
Return one of the list of string values passed if it matches the input, else return the default value, which
can be of any type. If the default is not supplied, '' is used as the default.
For input type="checkbox"
with no value
property. The value
returned is a boolean object which evaluates true if the input value for this field was 'on', else false.
For select multiple
and multiple fields with the same name
(especially
checkboxes). Return a list of each non-empty input strings given for this field.
For input type="image"
. Return a tuple (x, y) of position of the click,
clipped to within (0:width, 0:height) if the (width, height) tuple is
supplied. Returns (0, 0) if the input field was supplied but without x and y
co-ords, or (-1, -1) if the field was not in the input at all.
For input type="file"
. Fills the given directory with files uploaded through
the field, and return a list of tuples (storedFile, suppliedFile, mimeType,
length). The suppliedFile filename may be '' if no filename was specified.
storedFile is the full pathname of the stored file. The list is empty if no files were uploaded,
and is unlikely to be longer than one entry since few browsers support multiple-file upload.
Parse the input as an decimal (possibly negative) integer. Returns the default value if no
parsable number could be read. If default is omitted, zero is used as the default.
Returns sys.maxint
if the number is higher than Python can represent as an integer.
Note! Future versions of form may return a long integer for
form.INT
. I might restrict this to Python 1.6 and later, where str
doesn't add an 'L' to the end of the number, to avoid problems.
Parse the input as a simple floating point number, which may contain a decimal point, but not 'E' notation. Returns 0.0, or, if supplied, the default if the input is not a valid number or not supplied.
In HTML, there are some kinds of form fields where you can't use the value
attribute to pass information to the CGI script. These are input type="map"
,
where the value is always a pair of co-ordinates, and submit
,
where the value is used as the text for the button.
So if you wanted to detect which of a set of identically-labelled buttons was pressed, you'd have to give them all a different name, and include a check for each one in your script. This would be especially tedious for an order form with a hundred "Buy It!" buttons, for example.
For this reason, form allows you make a group of controls where the value submitted for each is taken from the name of the control instead of the value, when such a control is included in a submission. The actual value submitted is ignored.
To use the feature, put both the name and the desired value together in the HTML name of
the field, separated by a colon. (Which is a valid character for name
,
albeit a seldom-used one).
<input type="submit" name="b:left" value="Click me!"> <input type="submit" name="b:middle" value="Click me!"> <input type="submit" name="b:right" value="Click me!">
In this example, an call to form.readForm({'b': form.STRING})
would return either 'left', 'middle' or 'right', depending on which button was used to submit
the form. This is not limited to STRING
: values of all types
except FILE
may be embedded in names.
(You can still use names with colons if you do not wish to use the value-embedding feature. form only tries to separate a name with a colon in if it can't find the whole name as a key in your fdefs. The same goes for periods, which are special characters used by HTML in image maps.)
To embed characters which aren't normally allowed in HTML name
attributes, see the encI
function.
form will automatically decode this for you when reading name-embedding values.
Calling this function is not compulsory, but it allows you to set some of form's internal variables easily.
form.py includes features to protect against certain kinds of denial-of-service attacks in POST requests. They are turned off by default, but passing non-zero values in the "limit" parameters enables them.
The arguments you can set are:
form.INT
and form.FLOAT
will read number using European-style punctuation (where "." is a thousands-separator
and "," is the decimal point). If false (the default), it's the other way around.
All read
functions take submitted form data and parse it,
returning a dictionary-like object containing the values that have been posted
to the form, standardised according to the fdefs argument passed to
the function. The returned object may be read like a dictionary or like an object.
Typically, a script calls readForm
at the start of its
code. Scripts do not normally need to call the other read
functions directly.
read
functions is appropriate, and
calls that.
readUrlEncoded
, but takes its input from
a stream object (must support read()
)
instead of a string.
Decodes fields encoded in a multipart/form-data formatted string. parameters is a dictionary of MIME headers, lower-cased keys, containing at least a 'boundary' key.
Currently this function is no more efficient than
readFormDataStream
, since it is not
commonly needed.
readFormData
, but input is taken from a
stream object instead of a string. The length
is the number of bytes that should be read from the stream.
The write
functions take form values from a dictionary
(or dictionary-like object returned by the read
functions), and convert them into encoded text sent to a string or a
stream.
File upload fields only work for writeFormData
and
writeFormDataStream
since it does not make much
sense to try to upload a file to a query string or hidden form. File
upload values need not have a valid length value in the tuple as the
length is read directly from the file specified.
Currently, the string-returning functions are no more efficient than the stream-writing versions.
input type="hidden"
controls
for each field in the fvals dictionary. This is useful for writing a follow-up-form
that retains all the information posted into a previous form.
writeForm
, but send output to a stream
object (or anything supporting write
) instead of
returning a string.
<a href="...">
, remember to HTML-encode
the whole URL, or those & characters could confuse a browser.
writeUrlEncoded
, except that the output is sent to
the nominated stream.
These convenience functions are available for coding text for representation in HTML, URLs and JavaScript strings. If you have user input anywhere in your scripts, you'll need to do this a lot, or you're likely to make a site susceptible to security problems. (See this CERT advisory for an example of this.)
Encode text as HTML and return as string. ", &, <, > and control characters are replaced with HTML entities. This assumes you use the double-quote rather than single-quote for attribute strings, which is advisable. Obviously quotes do not need to be escaped outside of attribute values, but it does no harm.
Encode text as a URL part (replacing spaces with '+' and many symbols with %-encoded entities), and return as a string.
Note: you should not pass entire URLs through encU
,
only separate parts, for example a directory name in a path, or a key or
value string in a query. Once encoded you can combine these parts using '/', '?' and so on.
When writing HTML, remember to encode the complete URL if it has characters like '&' in.
Encode text so it can be included in HTML id
or
name
attributes. This is especially useful when
you need to include arbitrary strings in name-embedded
values.
This encoding is not a web standard, it's specific to form. Technically it simply replaces all disallowed characters with ':xx' where xx is the hex encoding of the character.
Encode text suitable for inclusion in a JavaScript string. This escapes single and
double quotation marks, and the ETAGO (</
) marker,
making the result safe to include in a string in a script
block.
Shorthand for encH(encJ(text))
, useful for writing
Javascript inside of an HTML attribute, especially event handlers.
Decodes text encoded into part of a URL, replacing the %-encoded entitites into plain text.
Decodes text escaped with encI
.
These functions are of general use to CGI functions and are provided together as a convenience, as well as being used internally by form.
Simply returns the string ' checked' if the condition is true or '' if false.
This often saves writing an if
statement whilst outputting a form.
print '<input type="checkbox" name="spam"'+form.checked(f.sendmespam)+'>'
Like checked
, but on true outputs ' selected', for
select
fields.
print '<option value="m"'+form.selected(f.sex=='m')+'>'
makeSafe
, but allow single periods and slashes,
but not combinations of them together or string starting with them. Also
allows the range of characters between C0-FF, used for accented letters in ISO-Latin
encodings.
The input-reading functions may throw the following exceptions:
Some aspect of the CGI environment is broken, for example environment variables not being correctly set by the script's caller.
cgiError
s are the fault of the web server, and should not happen in
working web sites.
An fdefs dictionary was passed to readForm
which included unknown fdef
values or unexpected parameters. Alternatively you passed a set of fields
to writeForm
or writeUrlEncoded
(or the
stream versions) which included a file-upload field. Note, readForm
may
also raise a TypeError, if some of the parameters in the fdefs were of the wrong type.
fdefError
s are your script's fault, and should not happen in working web
sites.
The HTTP request or the MIME message in a HTTP POST request is malformed in some way.
httpError
s are the user-agent's fault, so could happen in a working web site, but
only if either:
Finally, initialise
may throw a
NotImplementedError
if it is called with a version number
higher than the version of form being used.
form was written by Andrew Clover and is available under the GNU General Public Licence. This version is beta-test software and has not been exhaustively tested. However it has been in use on several production systems without apparent trouble.
Bugs, queries, comments to: andrew@oaktree.co.uk.
Copyright © 2000 Andrew Clover. Released under the GNU General Public License.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.