PXTL

the Python XML Templating Language -
a proposal and specification in progress

Introduction

PXTL is an XML-based language for interposing document markup with Python scripting. It is especially suitable for producing dynamic web pages, but is designed to be used with any XML-based output format.

This is the second draft of the PXTL specification. It may contain errors as well as things likely to be changed before implementation is completed. Please contact me at and@doxdesk.com with any comments on the spec or feature requests.

Rationale

Python is a great language for developing web applications, but it lacks a clean strategy for outputting dynamic pages. Interpolating a Python code hierarchy with the output markup hierarchy tends to produce difficult-to-read code, and Python’s reliance on indenting makes it impossible to format the code to fit within the markup hierarchy.

The work done by the JSR-052 group on templating in Java demonstrates the benefits of a taglib-style approach, using a single code and markup hierarchy to express the design more clearly. It is the aim of PXTL to bring something similar to Python, whilst improving on the design of JSTL where it is inflexible or confusing (often due to the design of JSP/taglibs or Java in general) and working around the difficulties inherent in using a whitespace-significant language for templating.

Unlike JSTL (and the many other templating systems available for Python), PXTL is a pure XML-based templating system. Source files must be well-formed XML; output is in XML, unless the optional legacy-HTML or plain-text recoding feature is enabled.

Nesting errors in code using JSTL taglibs can be hard to track down, as the source document is not well-formed and cannot be validated. (Additionally, the stack traces that result help little.) It is hoped that sticking to well-formed XML will ease debugging. In many cases, it should even be possible to prove programmatically that an PXTL template will always produce output that will validate to a particular DTD or schema, without actually having to run the code in the template.

Where JSTL/JSP is forced to use a separate expression language (EL) to avoid Java’s inconvenience for writing expressions, PXTL can use normal Python. The EL habit of suppressing exceptions is eschewed as it hinders debugging. PXTL’s scoping rules are also much simpler than those of JSP, so as to aid rapid development.

PXTL files are intended only for use as an output template. They are not a web application framework. Using them encourages clean separation of action and view, but does not require it; architecture is left to the author.

Principles

Python statements and expressions are introduced into a document by use of an XML Processing Instruction. There are various PIs for different kinds of output encoding. For example, to include and XML-encode text the PI <?_...?> can be used:

<p> Hello, <?_ user.getName() ?>, you fool. </p>

XML only allows PIs in document content. However, it is often useful to include Python code inside element attributes. For this reason, PXTL checks all attributes for ‘pseudo-PIs’ (which are surrounded with braces instead of angle brackets) and interprets them the same way as the standard PIs:

<a href="/users/{?_ user.getNumber() ?}.html">User info</a>

Flow control is done using tags from the PXTL namespace, for example:

<px:for item="image" in="images">
  <img alt="(Photo)" src="/img/{?_ image ?}.png" />
</px:for>

There are also some attributes from the PXTL namespace that can be added to any element to alter it.

<h1 px:tagname="'h'+str(headingLevel)">

Because Python in attributes is encoded as XML, any occurance of the less-than symbol, the ampersand or quotes must be escaped. For example:

<px:if test="yesOrNo() == &quot;Y&quot;">
  Yes!
</px:if>

The same applies to these three characters in pseudo-PIs, since they are also inside attribute values.

However, this annoying situation can often be avoided. A comparison including the less-than operator can be switched around (b>a instead of a<b); and single-quotes can be used instead of double-quotes in both XML and Python:

<px:if test="yesOrNo() == 'Y'">
  Yes!
</px:if>
<px:if test='yesOrNo() == "Y"'>
  Yes!
</px:if>

Ampersands are not common in Python code; on the rare occasion you need the & (bitwise and) operator can be accessed using the and_ function from the standard operator library.

Conventions

Typically, and in all examples in this document, the PXTL namespace will be bound to the prefix px:

<html xmlns:px="http://www.doxdesk.com/pxtl">

PXTL files typically have the file extension ‘.px’. In implementations of PXTL that save a compiled bytecode version of PXTL files for speed, these compiled versions may have the file extension ‘.pxc’.

Where MIME types are used, the media type for PXTL files shall be ‘text/x-pxtl+xml’. There is a ‘charset’ parameter for this type as usual for XML-based documents. Compiled files are in Python bytecode format, for which the media type ‘application/x-python-compiled’ is conventionally used.

In this document, PXTL directives are coloured blue, and actual Python code within them is red. (An XML-aware text editor that can check for well-formedness and syntax-color tags from different namespaces would obviously be an advantage when writing PXTL templates.)

Extended conditionals

PXTL borrows the well-understood if...elif...else construct from Python, and uses it in conditional elements and attributes.

For convenience and brevity in templates, PXTL also introduces two new conditionals, anif and orif. They may be used between if and else clauses as an alternative to elif clauses.

The operation of PXTL conditionals is specified as follows. Evaluating true or false is determined by the usual Python truth value testing rules.

In an if...elif...else construct:

In an if...anif...else construct:

In an if...orif...else construct:

Here are some usage examples (written as if anif and orif were real Python statements).

anif can be used with an else clause to catch failure at a number of steps in a process. This example sets a variable to a generic name if either of the two conditions evaluate false:

if obj!=None:
  desc= obj.describe()
anif desc!='':
  title= 'Item "'+desc+'"'
else:
  title= '('+getGeneric()+' item)'

orif can be used to let one condition fall-through to the next clause. This example prints the ‘info’ string if user was None, without bothering to call getEditMode:

if user==None:
  user= anonymousUserDetails
orif not state.getEditMode():
  print user.name+' info'
else:
  print 'editing '+user.name

Currently one might write structures like these using a temporary variable or an exception. However, exception-catching is not available in PXTL (because exceptions do not make sense in a templating language).

File format

A PXTL file is always a well-formed (and namespace-well-formed) XML document. Its output when run is also an XML document, unless an optional feature is enabled to convert the output to text or legacy (non-XML) HTML.

The active content of a PXTL template is contained within Processing Instructions and pseudo-processing instructions whose target names always (except for a few shortcuts) begin with px_, and elements and attributes from the namespace ‘http://www.doxdesk.com/pxtl’. All other elements, attributes and PIs (other than the <?xml ?> preamble, if used) are taken as being static content and copied to the output document without change.

Processing Instructions

PXTL offers a range of XML Processing Instructions to execute Python code and output some of its results. Which one to use depends on how you want the output to be processed.

A PI whose target name begins with px_ but is undefined in the current PXTL specification will cause an exception to be thrown.

XML requires a whitespace character after the PI target name. No space is required before the closing ?>, though including it may improve readability.

The PIs defined in PXTL are:

px_code

This target executes a block of Python code. No output is generated. The code can either be a single line, or a newline followed by a block of code.

<?px_code
  if debugging:
    debugLog('reading position')
  (x, y)= foo.getPosition()
?>

Each code block PI introduces its own indenting level. The indenting must be consistent within the block. It is not possible to leave a code block open across PIs, so you cannot use Python flow control statements over document content. (See ‘PXTL Elements’ for how this is done.)

Because this PI is quite common, a shortcut is allowed. px_ without any keyword following it is equivalent to px_code, and is usually used instead:

<?px_ (x, y)= foo.getPosition() ?>

px_text

This target evaluates the Python expression following the target, tries to convert it to a string if it is not already, and writes it to the document, automatically escaping any characters that are special using numeric entity references.

Since this is by far the most common PI, it can also be referred to using a shortcut target name of just a single underscore:

<p> Hello, <?px_text username ?>. </p>
<p> Hello again, <?_ username ?>! </p>

Extended Unicode characters are converted to match the encoding of the document. When the encoding is unspecified or not recognised by Python, all extended characters are converted to numeric entity references.

px_mark

This target evaluates its expression, converts to string and outputs it without any kind of escaping. The string may contain markup, which will be passed straight to the output document.

<?px_
  c= string.replace(comment, '\n', '<br />')
?>

<p>Comment: <?px_mark c ?></p>

One should use this PI with care, as any incorrect markup output this way can cause the final document to be invalid. User input must be thoroughly sanitised before being allowed into a document through the markup PI, otherwise cross-site-scripting security problems are possible.

px_upar

This target evaluates its expression, converts to string where necessary, and encodes each character to be safe for inclusion in a URL query string parameter.

See http://example.com/show.cgi?name=<?px_upar thing.id ?>

px_jstr

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a string in an ECMAScript-compliant scripting language (eg. JavaScript). This string may also be embedded in an HTML <script> block or an event handler.

<script type="text/javascript">
  window.alert('Hello, <?px_jstr username ?>');
</script>
<button onclick="alert('Hello, <?px_jstr username ?>');">

px_cstr

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a string in a CSS rule. This string may also be embedded in an HTML <style> block or inline style attribute.

<style type="text/css">
  a.external:after {
    content: '(
<?px_cstr extSiteLabel ?>)';
  }

</style>

px_note

This target ignores its contents and outputs nothing. Use it for adding comments to the template that should not be visible in the output. A shortcut target __ is available; normal usage would be:

<?__ FIXME name should not be shown for anonymous user ?>
Name: <?_ user.name ?>

Pseudo-PIs

As well as being inserted into document content as normal, PXTL PIs can be introduced into attribute values. Such a ‘pseudo-PI’ begins with the string {?, followed by a target name (as used by normal PIs), one or more whitespace characters, the pseudo-PI content, and a closing ?}.

In the rare case where a literal open-brace-question-mark sequence needs to be included in an attribute, one can write {?}, which as a special case will be replaced with a literal {?.

PXTL pseudo-PIs may not be included attributes from the PXTL namespace, nor attributes of elements from the PXTL namespace. Additionally they may not be used in xmlns namespace declaration attributes.

All the PXTL PIs may also be used as pseudo-PIs. There is also one additional PI target px_if which may only be used in pseudo-PIs. An exception will be generated if any target name not defined by PXTL is used in a pseudo-PI.

px_if

This target can only be used in pseudo-PIs inserted into attribute values. Its contents are evaluated for truth. If true, the attribute is included as normal and the PI procudes no output. If false, the attribute in which the pseudo-PI is embedded is not included at all in the output document.

<option value="M" selected="selected{?px_if isMale?}">

Elements

Flow control structures are written using elements from the PXTL namespace. All attributes of PXTL elements contain Python code directly (there is no need to use a PI to introduce active content).

if, elif, anif, orif

Conditionally include the contents of the element in the output document. These elements have one attribute, test, containing a Python expression to be evaluated for truth.

If the clause is successful (as defined in the section ‘Extended Conditionals’), the element’s child nodes are included in the output document at the point the element occupied.

If the clause is unsuccessful, it is discarded along with all the content contained in it.

Example:

<a href="message.html">Next message</a>
<px:if test="user.isAdmin">
  <a href="remove.py?id={?px_upar messageId ?}">
    Remove message
  </a>
</px:if>

else

This element may follow an if, elif, anif or orif conditional element, in which case its success is determined by the rules described in the section ‘Extended Conditionals’.

It may also follow a for or while iteration element, in which case it is successful if this loop was executed zero times. (Note: this is different behaviour to Python’s while...else construct, which has to do with the break statement; break makes no sense in a templating language and is not included in PXTL.)

If the clause is successful, its content is included in the output document, otherwise it is discarded along with all its child nodes.

Example:

<px:if test="user!=None">
  <h2> User '<?_ user.name ?>' </h2>
</px:if>
<px:anif test="user.email!=''">

  <p> Email: <?_ user.email ?> </p>
</px:anif>
<px:else>

  <p> (no e-mail address) </p>
</px:else>

for

Includes its contents repeatedly in the output document, once for each item in a list (or other iterable object).

If an item attribute is given, it contains an l-value that will have the item written to it during each iteration. An index attribute, if given, contains an l-value to which to write the integer index of each iteration. (If either of these l-values are new variables, they will be left in scope after the loop has completed, as normal for Python for loops.)

<px:for item="product" in="getAllProducts()" index="ix">
  <tr class="<?_ ['whiterow', 'shadedrow'][ix % 1] ?>">
    <td><?_ product.name ?></td>
    <td><?_ product.price ?></td>
  </tr>
</px:for>

A range attribute may be used instead of an in attribute. The value(s) in this attribute are the same as those for a range or xrange object.

while

Includes its contents repeatedly in the output document, until the condition specified in the test attribute no longer evaluates true.

breadcrumb trail:
<px:while test="page!=None">
  <a href="<?_ page.url ?>">
    <?_ page.name ?>
  </a>,
  <?px_ page= page.parent ?>
</px:while>

There is also an optional attribute min which states the minimum number of times the loop will be executed - 0 by default. The test test expression is not evaluated for the first min times around the loop, so by setting this attribute to 1, you can get a post-test loop, similar to the do...while construct from C-like languages.

(The usual Python idiom for a post-test loop is while 1:...if...break. However break is not included in PXTL.)

def

Defines a template that can be called as a function from Python code. The fn attribute contains an l-value to which to assign the function. The args attribute contains a tuple of l-values to which to assign the function’s arguments, as in a Python function declaration.

<px:def fn="writeComments" args="parentComment, level">
  <px:for item="comment" in="parentComment.children">
    <h3> <?_ comment.title ?> </h3>
    <p> <?_ comment.body ?> </p>
    <div class="replies">
      <?px_ writeComments(comment, level+1) ?>
    </div>
  </px:for>
</px:def>

<h2> Mysite user forum </h2>
<?px_ writeComments(forumRoot, 0) ?>

Nesting a template definition inside another template definition is allowed, subject to the same scoping rules as nested function definitions in whichever version of Python you are using.

Template functions behave like any Python function and can be stored in variables, passed to other functions which can call them back, and so on. There is however no output value from a template function (because the output is the content itself), so you should not attempt to return (or yield) in a template function.

import

Include the output of the template whose filename is given in the src attribute. This filename is relative to the current document and uses the filepath conventions of the host platform.

The optional as attribute is an l-value to which to assign a module object representing the PXTL template. You can then access any functions or variables defined in that file’s global scope.

An imported template might be used as a simple ‘include’, outputting a single common page part, or might define template functions to be called by the including page. Or it can do both.

<px:import src="'pagesTop.px'" as="top" />
<?px_ top.writeHead(style= userSkin, title= 'My page title') ?>
<body>
  <?px_ top.writeNav(currentPage) ?>
  <p> (Page content) </p>
  <px:import src="os.path.join('global', 'disclaimer.px')" />
</body>

Note that, like all PXTL attributes, the content of src is a Python expression. If you just want a static string, you must include Python quotes as in the first import in the example above.

The import element can also be used to include plain text or markup from a file, without interpreting it as a PXTL template. This behaviour depends on the MIME media type of the file pointed to by the src attribute.

If the media type of the file is ‘text/plain’, it will be included as text, encoded with the same rules as the px_text Processing Instruction. If the media type is ‘text/html’, ‘text/xhtml+xml’, ‘text/xml’ or ‘application/xml’, the file will be included as unaltered markup, as with the px_mark PI.

All other media types are treated as PXTL templates (‘text/x-pxtl+xml’).

Because the filesystems of many platforms do not store MIME type information (leaving one to rely on file extensions), there is a type attribute which can be used to override the reported media type and include as text, markup or PXTL regardless.

This element is not used to import standard modules; to do that just use a normal Python import statement in a px_code PI.

none

This is the PXTL null-element. A none element will never be sent the output document, although its children will be. Purposes for none include:

Attributes

Some attributes from the PXTL namespace can be included on any element to change that element in some way. All PXTL attributes’ values are Python expressions.

The PXTL attributes may not be included in any PXTL element other than the none element.

if, elif, anif, orif

The values of these conditional attributes are conditions to be evaluated for truth. Their success is determined by the rules described in the section ‘Extended Conditionals’.

An element may carry at most one of the conditional attributes (including else, described below).

If an element carries a successful conditional attribute it is included in the output document as-is. An unsuccessful conditional attribute causes the element to be removed. However any child content is included at the point the element used to occupy.

In multi-part constructs, the conditional attributes are nested children, not sequential siblings. That is, an else clause’s ‘preceding clause’ is its nearest ancestor having a conditional attribute. This need not be the direct parent.

<a href="index.html" px:if="currentPage!='home'">
  <img src="/img/nav/home.gif" alt="home" />
  <img src="/img/rollover/arrow.gif" alt="->" px:anif="True" />
</a>

In this example, the ‘home’ image will always be part of the output document. If the current page is not ‘home’, the image will be in a link, together with the ‘arrow’ image.

else

A conditional attribute. Must be nested within an element with one of the above conditional attributes; its success is determined by the usual rules, and if unsuccessful the element will be removed, its child content reparented.

The contents of the else attribute are irrelevant because no condition expression is needed in an else clause. Typically a blank string is used.

<ul px:if="isCountable">
  <ol px:else="">
    <li> item </li>
  </ol>
</ul>

tagname

Specifies an replacement tagname for the element. This attribute’s value is an expression which should evaluate to a string. An XML namespace prefix may be included in the tagname.

<h2 px:tagname="'h'+str(headLevel)">Info</h2>
<p> <?_ bodyText ?> </p>

The replacement tagname must not evaluate to a name in the PXTL namespace.

attr

Adds or changes arbitrary attributes of an element. The value should evaluate to a dictionary whose keys are attribute name strings and whose values are either strings or None. If an attribute with a specified name is already present in the element it will be overwritten by the value from the dictionary. If that value is None, the attribute will be removed.

Attribute names may be namespace-prefixed, but namespace declarations (xmlns) and attributes from the PXTL namespace are disallowed.

<?px_
  anchorAttr= 'id'
  if browser.isOldAndNasty:
    anchorAttr= 'name'
?>

<a px:attr="{anchorAttr: 'foo'}">About foo</a>

doctype

The doctype attribute is used on the root element of a PXTL template. It is used to set the output mode of the entire template. Additionally, if the current file is the root template (ie. not included by another template through a <px:import> element), a <!DOCTYPE> declaration may be added to the start of the output document.

The content of the doctype attribute is a tuple of strings (outputMode, publicIdentifier, systemIdentifier). The second and third members of the tuple are for the <!DOCTYPE> declaration: either may be set to None or omitted; if neither is supplied no <!DOCTYPE> will be output.

A PXTL implementation module offers some built-in constants you can use as shorthand for the full doctype tuple when you are using common web document types. They are:

If no doctype attribute is specified, PXTL looks at the tag name of the root element. If it is ‘html’ and is in the namespace ‘http://www.w3.org/1999/xhtml’ or not in any namespace, the settings of XHTML1T are used. Otherwise, the XML settings are used.

Output mode

The output mode of a template is set by a PXTL doctype attribute on its root element. This affects how the output from the template is formatted.

Any templates imported into the document (using the import element) have their own output mode. If this does not match the output mode of the importing template invalid output may occur.

xml

When the output mode is ‘xml’, any empty elements can be output in the shorthand form <tag/> form instead of <tag></tag>. CDATA sections and entity references are left unchanged.

xhtml

This output mode produces XML markup likely to be understood by legacy HTML parsers, as described in Appendix C of the XHTML 1.0 specification.

In this mode, the empty element shorthand form is only used on when the HTML element of with the same tagname is defined to be empty. When this form is used (eg. in <img>), whitespace is placed before the final />.

Any xml:lang attributes are copied to HTML lang attributes. &apos; entity references (standard in XML but not HTML) are changed to numeric entity references. Newlines in attribute values are changed to spaces.

Suitably escaped CDATA sections are used for elements defined to contain CDATA in HTML (script and style); CDATA sections not contained in such an element are decoded and appropriately escaped.

html

In this mode, output is transformed to SGML-based HTML instead of XML.

Empty elements are written without an end-tag when they have the same tagname as an HTML tag defined to be empty. Attributes that can be minimalised in HTML always are.

xml:lang attributes are changed to HTML lang attributes. &apos; is changed to a numeric entity reference. Newlines in attribute values are changed to spaces.

The contents of CDATA-elements are hidden using comments. CDATA sections not contained in such an element are decoded and appropriately escaped.

text

In ‘text’ mode, no tags will be emitted at all, and all entity references and CDATA sections will be normalised to text.

(PXTL is not ideal for templating flat files like this, but the option is there for PXTL applications that also need to do simple non-XML templating work.)

Implementation

This document does not specify a particular implementation. However, any PXTL implementation must provide a way of invoking a template file, and passing values in to its global scope from another Python script.

An implementation must also put an object in each template’s global scope named pxtl. This may or may not be the implementation module itself; that may depend on thread-safety requirements.

The pxtl object must have constant properties corresponding to the shortcut values for the doctype attribute.

This object must also have a method write, which can be used to output a string to the output document directly. (Rather than using print which might end up elsewhere on some implementations.)

An optional second parameter to this method allows output to be encoded in the same way as with the output PIs. This parameter should have a value taken from a constant on the pxtl object, TEXT, UPAR, JSTR, CSTR or MARK. If this parameter is omitted it defaults to TEXT.

Execution

A PXTL implementation acts as if it is walking across the DOM tree from first to last sibling in a depth-first fashion.

PXTL PIs are executed/evaluated and replaced with any output they generate.

Elements are interpreted in the following order, or one indistinguishable from it:

  1. If there is a PXTL doctype attribute with a non-None public or system identifier, and the element is the root of the template, and the template is the root of the output document, a <!DOCTYPE declaration is placed immediately before the element.
  2. If there is a PXTL tagname attribute, its contents are interpreted and used to overwrite the element’s tagname.
  3. If the tagname is a PXTL none, the element is to be removed from the document and its children reparented.
  4. If there is a conditional attribute, it is checked for success, possibly evaluating the attribute’s expression in the process. If unsuccessful, the element is to be removed from the document and its children reparented.
  5. If there is a PXTL attr attribute, each value in its evaluated dictionary is written to the attribute with the corresponding name. Attributes with value None are removed from the element.
  6. All PXTL attributes are removed from the element.
  7. For each attribute, in any order, if there is a px_if pseudo-PI in the attribute value and its expression evaluates false, the attribute is removed from the element. Otherwise the text of the pseudo-PI is removed from the attribute value.
  8. For each attribute, in any order, pseudo-PIs are evaluated and replaced with their value.
  9. If the element is from the PXTL namespace:

‘Reparenting’ means moving an element’s child nodes from inside the element to a position as an adjacent sibling to the element. First any xmlns namespace declarations from the element must copied to its child elements (where not overridden by other namespace declarations on each child).

Changelog

The first, unfinished, draft of PXTL was released for limited comment in July 2002.

This is the second draft of the specification, released January 2003, for general public discussion. It is largely complete, although there is currently no resolution of what (if any) whitespace handling should be done.

Substantive changes to the spec since the first draft are:

The next draft is expected in Q2 2003, with changes derived from public comment. This will be a Release Candidate for PXTL 1.0. A non-optimised reference implementation of the pxtl module should be made available with this draft.

PXTL 1.0 is expected to be released by end 2003, and should come with a production implementation and additional, less technical tutorial.