PXTL

Python XML Templating Language
Specification 1.0

Introduction

PXTL templates are namespace-well-formed XML documents that can be transformed to produce output documents (of any type, but usually XML or HTML), controlled by Python code embedded in the template.

Code is embedded in four ways:

The values of attributes from the PXTL namespace and attributes of elements from the PXTL namespace, along with the data part of PIs and pseudo-PIs, are Python code.

Conventions

Typically, and in all examples in this document, the PXTL namespace will be bound to the prefix px:

<html xmlns:px="http://www.doxdesk.com/pxtl">

PXTL files normally have the file extension ‘.px’. In implementations of PXTL that save a compiled bytecode version of PXTL files for speed, these compiled versions may have the file extension ‘.pxc’.

Where MIME types are used, the media type for PXTL files shall be ‘text/x.pxtl+xml’. There is a ‘charset’ parameter for this type as with other text types. Compiled files are in Python bytecode format, for which the media type ‘application/x-python-code’ is conventionally used.

In this document, PXTL directives are coloured blue, and actual Python code within them is red.

Extended conditionals

PXTL borrows the well-understood if...elif...else construct from Python, and uses it in conditional elements and attributes.

For convenience and brevity in templates, PXTL also introduces two new conditionals, anif and orif. They may be used in the same places elif can, though for the sake of code clarity it is usually best not to mix them together too much.

The operation of PXTL conditionals is specified as follows. Testing the truth of the condition value is determined by the usual Python boolean-value rules.

Conditionals use lazy evaluation. (For example, an anif clause following a unsuccessful if will never have its condition expression calculated, as the result would not make a difference to the clause’s success.)

File format

Processing Instructions

PXTL offers a range of XML Processing Instructions to execute Python code and output some of its results. Which one to use depends on how you want the output to be processed.

A PI whose target name begins with px_ but is undefined in the current PXTL specification will cause an exception to be thrown.

px_code

This target executes a block of Python code. No output is generated. The code can either be a single line, or a colon and newline followed by a block of code.

<?px_code (x, y)= foo.getPosition() ?>

Each code block PI introduces its own indenting level. The indenting must be consistent within the block. All indented blocks are ended at the end of the PI.

Because this PI is quite common, a shortcut is allowed. px without any keyword following it is equivalent to px_code, and is usually used instead.

px_text

This target evaluates the Python expression following the target, tries to convert it to a string if it is not already, and writes it to the document, automatically escaping any characters that are special using character references.

Since this is by far the most common PI, it can also be referred to using a shortcut target name of just a single underscore.

px_mark

This target evaluates its expression, converts to string and outputs it without any kind of escaping. The string may contain markup, which will be passed straight to the output document.

px_upar

This target evaluates its expression, converts to UTF-8 string where necessary, and encodes each character to be safe for inclusion in a URL query string parameter. (%20 is used in preference to a plus symbol for representing the space character.)

px_jstr

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a string in an ECMAScript-compliant scripting language (eg. JavaScript). This string may also be embedded in an HTML <script> block or an event handler attribute.

px_cstr

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a string in a CSS rule. This string may also be embedded in an HTML <style> block or inline style attribute.

px_name

This target evaluates its expression as a UTF-8 encoded string and replaces any non-alphabetical characters with '_XX' hexadecimal encoded sequences, suitable for inclusion in an XML Name token.

This form of encoding is not any kind of Internet standard, but may be of use to applications that wish to include possibly invalid characters in a name to be used in an ID attribute.

px_note

This target ignores its contents and outputs nothing. Use it for adding comments to the template that should not be visible in the output. A shortcut target __ is available.

Pseudo-PIs

PIs cannot be used to insert content into attribute values because they may not contain < characters. Instead, pseudo-PIs are used, with the same syntax as a PI but using curly brackets instead of angle brackets. The contents of pseudo-PIs are part of an attribute’s value so must be HTML-encoded if the characters &, < or the delimiting quote characters are used.

In the rare case where a literal open-brace-question-mark sequence needs to be included in an attribute, one can write {?}, which as a special case will be replaced with a literal {?.

PXTL pseudo-PIs may not be included attributes from the PXTL namespace, nor attributes of elements from the PXTL namespace. Additionally they may not be used in xmlns namespace declaration attributes.

The target name of a PI must be a PXTL target. The targets px_text (and _), px_upar, px_jstr, px_cstr, px_name and px_note may be used in pseudo-PIs with the same output results as normal PIs.

px_mark and px_code may not be used in pseudo-PIs. There is one additional target that may be used only by pseudo-PIs:

px_if

The contents are evaluated for truth. If true, the attribute which contains this pseudo-PI is included as normal and the PI procudes no output. If false, the attribute in which the pseudo-PI is embedded is not included at all in the output document.

Elements

if, elif, anif, orif

Conditionally include the contents of the element in the output document. The if element may occur anywhere and begins a new set of if-clauses. An elif, anif or orif element must be the next element sibling of a conditional clause, which becomes the ‘preceding clause’.

These elements have one attribute, test, containing the condition expression. If the clause is successful (as defined in the section ‘Extended Conditionals’), the element’s child nodes are included in the output document at the point the element occupied. If the clause is unsuccessful, it is discarded along with all the content contained in it.

else

This element must be the next element sibling of a conditional element, which becomes the ‘previous clause’. Its success is determined as defined in ‘Extended Conditionals’.

If the clause is successful, its content is included in the output document, otherwise it is discarded along with all its child nodes.

The else element has no attributes.

for

Includes its contents repeatedly in the output document, once for each item in a list (or list-style object).

The in attribute is an expression that evaluates to the list to iterate over. A range attribute may be given instead, containing parameters as passed to the range or xrange builtins. The resulting range will be iterated over.

If an item attribute is given, it contains an l-value that will have the item written to it during each iteration. If an index attribute is given, it contains an l-value that will have the integer index of each iteration written to it.

This element also operates as a conditional element, so may be followed by further conditional elements such as else. A for element is considered successful if its body executes at least once.

while

Includes its contents repeatedly in the output document, until the condition specified in the test attribute no longer evaluates true.

An index attribute, if given, contains an l-value to which to write the integer index of each iteration.

A min attribute, if given, states the minimum number of times the loop will be executed - 0 by default. The test test expression is not evaluated for the first min times around the loop.

This element also operates as a conditional element. A while element is considered successful if its body executes at least once.

def

Defines a subtemplate function that can be called from elsewhere in this or another template. The fn attribute contains an l-value to which to assign the an object representing the subtemplate.

The def element and its children will be removed from the document.

The optional args attribute contains an argument list, as written in a function declaration. If omitted, the function takes no arguments.

Nesting a subtemplate definition inside another is allowed, and is subject to the same scoping rules as nested function definitions in whichever version of Python you are using.

call

Executes a subtemplate function represented by the object given in the fn attribute. The call element is replaced by the transformed children of the def element that defined the subtemplate.

The optional args attribute contains argument values to pass in to the subtemplate; if omitted, no arguments are passed.

import

Include external content referenced by the src attribute, a Python expression evaluating to a string URI. The import element is replaced with the external content.

If a relative URI is used, the base is defined by the XML Base specification. (If no xml:base attributes are in use, this is the current template’s URI.)

If the media type of the referenced content is ‘text/plain’, it will be included as plain text (encoded with the same rules as the px_text processing instruction). If the media type is one of the HTML media types, or one of the vanilla XML media types (including the external-entity types), the file will be included, unaltered, as literal markup.

All other types are treated as PXTL templates (‘text/x.pxtl+xml’) and transformed before inclusion. In this case, the optional globals attribute contains a dictionary to use as a global scope for the imported template, and the optional as attribute contains an l-value to which to assign an object representing the PXTL template. Globals defined in the template after execution can be accessed as properties of this object.

The media type of the referenced content can be specified by the optional type attribute. This evaluates to a string giving the media type to treat the referenced content as.

none

This is the PXTL null-element. A none element will never be sent the output document, it is always replaced by its children. This can be useful as a root element in imported templates that produce no output, or more than one output element.

Attributes

PXTL attributes may not be included in any PXTL element other than the none element.

if, elif, anif, orif

The values of these conditional attributes are their clause's condition expressions. The success of a conditional attribute is determined by the rules described in the section ‘Extended Conditionals’.

An element may carry at most one of the conditional attributes (including else, described below). If an element carries a successful conditional attribute it is included in the output document. An unsuccessful conditional attribute causes the element to be replaced by its children.

In multi-part constructs, the clauses are nested children, not sequential siblings. That is, an else attribute’s ‘previous clause’ is the conditional attribute of its nearest ancestor that has one. (This need not be the direct parent.)

else

A conditional attribute. Must be nested within an element with one of the above conditional attributes; its success is determined by the usual rules; if unsuccessful the element will be removed, its child content reparented.

The contents of the else attribute are irrelevant because no condition expression is needed in an else clause. A blank string may be used.

tagname

Specifies an replacement tagname for the element. Attribute value is a string expression.

An XML namespace prefix may be included in the tagname, but this must not map to the PXTL namespace.

attr

Changes arbitrary attributes of an element. Attribute evaluates to a mapping whose keys are string attribute names. Attribute names used may not map to the PXTL or XML-Namespace namespaces.

Values are converted to strings if necessary and set as the value of the attribute named by the key, replacing any existing attribute with the same name. If the value is None, the named attribute is instead removed from the element if it exists.

doctype

The doctype attribute is used on the root element of the primary PXTL template being transformed. On imported templates it has no effect.

Its value is a tuple expression of string values (mimetype, method, publicId, systemId). If the tuple contains fewer than four items, it is padded with None values; if the attribute is not used at all it is equivalent to using four Nones.

mimetype is used to advise a PXTL implementation what media type the final transformed document should be served as. If None, no information is given and an implementation may guess if it needs a media type. Typically this guess might be based on the output method.

method advises a PXTL implementation on how the final transformed document should be serialised - see ‘Output methods’. If None, the implementation may guess an output method if it needs to serialise the document. Typically this might default to 'xml' unless HTML output is detected in which case 'xhtml' might be used instead.

publicId and systemId are used to add a <!DOCTYPE> declaration to the output document. Either or both may be string values to be used for PUBLIC and SYSTEM identifiers; if both are None, the output document will have no doctype.

future

The future attribute may be used on the root element of any template (including imported templates). It serves the same purpose as the Python construct from __future__ import, allowing language features to be specified for all embedded code in that template.

The value of the future attribute should be a comma-separated list of future-features, as would be used in such an import statement.

space

The space attribute specifies how whitespace in element content (‘ignorable’ whitespace) is handled. If it evaluates to None, the element inherits the whitespace processing of its parent. Otherwise, a true value preserves all whitespace (the default for the root element), and a false value removes whitespace in element content.

Whitespace removal is a pre-processing step carried out on an element before its children are transformed.

Output methods

The output method of a template is set by a PXTL doctype attribute on its root element. This affects how the output from the template is serialised. Possible values are:

xml

When the output mode is ‘xml’, any empty elements may be output in the shorthand form <tag/> form instead of <tag></tag>. CDATA sections and entity references are left in the document unchanged.

xhtml

This output mode produces XML markup likely to be understood by legacy HTML parsers, as described in Appendix C of the XHTML 1.0 specification.

In this mode, the empty element shorthand form is only used when the HTML element of with the same tagname is defined to be empty, and in this case whitespace is ensured before the closing slash - eg. <img />.

Any xml:lang attributes are copied to HTML lang attributes.

CDATA sections inside elements whose contents are implicitly CDATA in HTML (that is, script and style) are reformed to be compatible with both XML and SGML parsers, and to hide from legacy HTML 2 parsers.

html

In this mode, output is transformed to SGML-based HTML instead of XML.

Empty elements are written without end-tags when they are defined as empty in HTML. Attributes that can be minimised in HTML always are.

xml:lang attributes are changed to HTML lang attributes. Other xml: attributes, and namespace declarations, are removed.

CDATA sections inside CDATA elements are changed to comments, to hide from legacy HTML 2 parsers.

text

In ‘text’ mode, no tags, comments or PIs will be emitted at all. Text nodes, entity references and CDATA sections will all be output as plain text only.

Environment

Python code in templates is subject to the same syntax rules as normal for the version of Python in use, with the following additional restrictions:

pxtl object

An implementation must put an object in each template’s global scope named pxtl, with the following members.

Doctype shortcuts

These constants are provided as quick ways of specifying values for the doctype attribute.

Implementations may provide further shortcuts for common XML document types.

write

The pxtl.write method may be called only inside a px_code processing instruction.

The first parameter text contains content to be output to the transformed document in the place of the processing instruction. The optional second parameter coding is a string indicating what sort of coding to apply to the content - either 'text', 'upar', 'jstr', 'cstr', 'name' or 'mark'. These correspond to the coding performed by the similarly-named processing instructions. If omitted, it defaults to 'text'.

Execution

A PXTL implementation acts as if it is walking across the DOM tree from first to last sibling in a depth-first fashion.

PXTL PIs are executed/evaluated and replaced with any output they generate.

Elements are processed in the following order, or one indistinguishable from it:

  1. PXTL doctype and future attribute processing.
  2. Processing of PXTL space attribute, and child whitespace removal if necessary.
  3. PXTL tagname attribute processing.
  4. Removal if PXTL none element.
  5. Checking for a conditional PXTL attribute and removal if unsuccessful.
  6. PXTL attr attribute processing.
  7. For each attribute, in any order, evaluation of any px_if pseudo-PIs in the value and removal of the attribute if unsuccessful.
  8. For each attribute, in any order, replacement of pseudo-PIs with their output value.
  9. If the element is from the PXTL namespace: