lxml.objectify

Authors: Stefan Behnel
Holger Joukl

lxml supports an alternative API similar to the Amara bindery or gnosis.xml.objectify through a custom Element implementation. The main idea is to hide the usage of XML behind normal Python objects, sometimes referred to as data-binding. It allows you to use XML as if you were dealing with a normal Python object hierarchy.

Accessing the children of an XML element deploys object attribute access. If there are multiple children with the same name, slicing and indexing can be used. Python data types are extracted from XML content automatically and made available to the normal Python operators.

Contents

To set up and use objectify, you need both the lxml.etree module and lxml.objectify:

>>> from lxml import etree
>>> from lxml import objectify

The objectify API is very different from the ElementTree API. If it is used, it should not be mixed with other element implementations (such as trees parsed with lxml.etree), to avoid non-obvious behaviour.

The benchmark page has some hints on performance optimisation of code using lxml.objectify.

To make the doctests in this document look a little nicer, we also use this:

>>> import lxml.usedoctest

Imported from within a doctest, this relieves us from caring about the exact formatting of XML output.

The lxml.objectify API

In lxml.objectify, element trees provide an API that models the behaviour of normal Python object trees as closely as possible.

Creating objectify trees

As with lxml.etree, you can either create an objectify tree by parsing an XML document or by building one from scratch. To parse a document, just use the parse() or fromstring() functions of the module:

>>> from StringIO import StringIO
>>> fileobject = StringIO('<test/>')

>>> tree = objectify.parse(fileobject)
>>> print isinstance(tree.getroot(), objectify.ObjectifiedElement)
True

>>> root = objectify.fromstring('<test/>')
>>> print isinstance(root, objectify.ObjectifiedElement)
True

To build a new tree in memory, objectify replicates the standard factory function Element() from lxml.etree:

>>> obj_el = objectify.Element("new")
>>> print isinstance(obj_el, objectify.ObjectifiedElement)
True

After creating such an Element, you can use the usual API of lxml.etree to add SubElements to the tree:

>>> child = etree.SubElement(obj_el, "newchild", attr="value")

New subelements will automatically inherit the objectify behaviour from their tree. However, all independent elements that you create through the Element() factory of lxml.etree (instead of objectify) will not support the objectify API by themselves:

>>> subel = etree.SubElement(obj_el, "sub")
>>> print isinstance(subel, objectify.ObjectifiedElement)
True

>>> independent_el = etree.Element("new")
>>> print isinstance(independent_el, objectify.ObjectifiedElement)
False

Element access through object attributes

The main idea behind the objectify API is to hide XML element access behind the usual object attribute access pattern. Asking an element for an attribute will return the sequence of children with corresponding tag names:

>>> root = objectify.Element("root")
>>> b = etree.SubElement(root, "b")
>>> print root.b[0].tag
b
>>> root.index(root.b[0])
0
>>> b = etree.SubElement(root, "b")
>>> print root.b[0].tag
b
>>> print root.b[1].tag
b
>>> root.index(root.b[1])
1

For convenience, you can omit the index '0' to access the first child:

>>> print root.b.tag
b
>>> root.index(root.b)
0
>>> del root.b

Iteration and slicing also obey the requested tag:

>>> x1 = etree.SubElement(root, "x")
>>> x2 = etree.SubElement(root, "x")
>>> x3 = etree.SubElement(root, "x")

>>> [ el.tag for el in root.x ]
['x', 'x', 'x']

>>> [ el.tag for el in root.x[1:3] ]
['x', 'x']

>>> [ el.tag for el in root.x[-1:] ]
['x']

>>> del root.x[1:2]
>>> [ el.tag for el in root.x ]
['x', 'x']

If you want to iterate over all children or need to provide a specific namespace for the tag, use the iterchildren() method. Like the other methods for iteration, it supports an optional tag keyword argument:

>>> [ el.tag for el in root.iterchildren() ]
['b', 'x', 'x']

>>> [ el.tag for el in root.iterchildren(tag='b') ]
['b']

>>> [ el.tag for el in root.b ]
['b']

XML attributes are accessed as in the normal ElementTree API:

>>> c = etree.SubElement(root, "c", myattr="someval")
>>> print root.c.get("myattr")
someval

>>> root.c.set("c", "oh-oh")
>>> print root.c.get("c")
oh-oh

In addition to the normal ElementTree API for appending elements to trees, subtrees can also be added by assigning them to object attributes. In this case, the subtree is automatically deep copied and the tag name of its root is updated to match the attribute name:

>>> el = objectify.Element("yet_another_child")
>>> root.new_child = el
>>> print root.new_child.tag
new_child
>>> print el.tag
yet_another_child

>>> root.y = [ objectify.Element("y"), objectify.Element("y") ]
>>> [ el.tag for el in root.y ]
['y', 'y']

The latter is a short form for operations on the full slice:

>>> root.y[:] = [ objectify.Element("y") ]
>>> [ el.tag for el in root.y ]
['y']

You can also replace children that way:

>>> child1 = etree.SubElement(root, "child")
>>> child2 = etree.SubElement(root, "child")
>>> child3 = etree.SubElement(root, "child")

>>> el = objectify.Element("new_child")
>>> subel = etree.SubElement(el, "sub")

>>> root.child = el
>>> print root.child.sub.tag
sub

>>> root.child[2] = el
>>> print root.child[2].sub.tag
sub

Note that special care must be taken when changing the tag name of an element:

>>> print root.b.tag
b
>>> root.b.tag = "notB"
>>> root.b
Traceback (most recent call last):
  ...
AttributeError: no such child: b
>>> print root.notB.tag
notB

Tree generation with the E-factory

To simplify the generation of trees even further, you can use the E-factory:

>>> E = objectify.E
>>> root = E.root(
...   E.a(5L),
...   E.b(6.1),
...   E.c(True),
...   E.d("how", tell="me")
... )

>>> print etree.tostring(root, pretty_print=True)
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
  <a py:pytype="long">5</a>
  <b py:pytype="float">6.1</b>
  <c py:pytype="bool">true</c>
  <d py:pytype="str" tell="me">how</d>
</root>

This allows you to write up a specific language in tags:

>>> ROOT = objectify.E.root
>>> TITLE = objectify.E.title
>>> HOWMANY = getattr(objectify.E, "how-many")

>>> root = ROOT(
...   TITLE("The title"),
...   HOWMANY(5)
... )

>>> print etree.tostring(root, pretty_print=True)
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype">
  <title py:pytype="str">The title</title>
  <how-many py:pytype="int">5</how-many>
</root>

objectify.E is an instance of objectify.ElementMaker. By default, it creates pytype annotated Elements without a namespace. You can switch off the pytype annotation by passing False to the annotate keyword argument of the constructor. You can also pass a default namespace and an nsmap:

>>> myE = objectify.ElementMaker(annotate=False,
...           namespace="http://my/ns", nsmap={None : "http://my/ns"})

>>> root = myE.root( myE.someint(2) )

>>> print etree.tostring(root, pretty_print=True)
<root xmlns="http://my/ns">
  <someint>2</someint>
</root>

Namespace handling

Namespaces are handled mostly behind the scenes. If you access a child of an Element without specifying a namespace, the lookup will use the namespace of the parent:

>>> root = objectify.Element("{ns}root")
>>> b = etree.SubElement(root, "{ns}b")
>>> c = etree.SubElement(root, "{other}c")

>>> print root.b.tag
{ns}b
>>> print root.c
Traceback (most recent call last):
    ...
AttributeError: no such child: {ns}c

You can access elements with different namespaces via getattr():

>>> print getattr(root, "{other}c").tag
{other}c

For convenience, there is also a quick way through item access:

>>> print root["{other}c"].tag
{other}c

The same approach must be used to access children with tag names that are not valid Python identifiers:

>>> el = etree.SubElement(root, "{ns}tag-name")
>>> print root["tag-name"].tag
{ns}tag-name

>>> new_el = objectify.Element("{ns}new-element")
>>> el = etree.SubElement(new_el, "{ns}child")
>>> el = etree.SubElement(new_el, "{ns}child")
>>> el = etree.SubElement(new_el, "{ns}child")

>>> root["tag-name"] = [ new_el, new_el ]
>>> print len(root["tag-name"])
2
>>> print root["tag-name"].tag
{ns}tag-name

>>> print len(root["tag-name"].child)
3
>>> print root["tag-name"].child.tag
{ns}child
>>> print root["tag-name"][1].child.tag
{ns}child

or for names that have a special meaning in lxml.objectify:

>>> root = objectify.XML("<root><text>TEXT</text></root>")

>>> print root.text.text
Traceback (most recent call last):
  ...
AttributeError: 'NoneType' object has no attribute 'text'

>>> print root["text"].text
TEXT

Asserting a Schema

When dealing with XML documents from different sources, you will often require them to follow a common schema. In lxml.objectify, this directly translates to enforcing a specific object tree, i.e. expected object attributes are ensured to be there and to have the expected type. This can easily be achieved through XML Schema validation at parse time. Also see the documentation on validation on this topic.

First of all, we need a parser that knows our schema, so let's say we parse the schema from a file-like object (or file or filename):

>>> from StringIO import StringIO
>>> f = StringIO('''\
...   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...     <xsd:element name="a" type="AType"/>
...     <xsd:complexType name="AType">
...       <xsd:sequence>
...         <xsd:element name="b" type="xsd:string" />
...       </xsd:sequence>
...     </xsd:complexType>
...   </xsd:schema>
... ''')
>>> schema = etree.XMLSchema(file=f)

When creating the validating parser, we must make sure it returns objectify trees. This is best done with the makeparser() function:

>>> parser = objectify.makeparser(schema = schema)

Now we can use it to parse a valid document:

>>> xml = "<a><b>test</b></a>"
>>> a = objectify.fromstring(xml, parser)

>>> print a.b
test

Or an invalid document:

>>> xml = "<a><b>test</b><c/></a>"
>>> a = objectify.fromstring(xml, parser)
Traceback (most recent call last):
XMLSyntaxError: Element 'c': This element is not expected.

Note that the same works for parse-time DTD validation, except that DTDs do not support any data types by design.

ObjectPath

For both convenience and speed, objectify supports its own path language, represented by the ObjectPath class:

>>> root = objectify.Element("{ns}root")
>>> b1 = etree.SubElement(root, "{ns}b")
>>> c  = etree.SubElement(b1,   "{ns}c")
>>> b2 = etree.SubElement(root, "{ns}b")
>>> d  = etree.SubElement(root, "{other}d")

>>> path = objectify.ObjectPath("root.b.c")
>>> print path
root.b.c
>>> path.hasattr(root)
True
>>> print path.find(root).tag
{ns}c

>>> find = objectify.ObjectPath("root.b.c")
>>> print find(root).tag
{ns}c

>>> find = objectify.ObjectPath("root.{other}d")
>>> print find(root).tag
{other}d

>>> find = objectify.ObjectPath("root.{not}there")
>>> print find(root).tag
Traceback (most recent call last):
  ...
AttributeError: no such child: {not}there

>>> find = objectify.ObjectPath("{not}there")
>>> print find(root).tag
Traceback (most recent call last):
  ...
ValueError: root element does not match: need {not}there, got {ns}root

>>> find = objectify.ObjectPath("root.b[1]")
>>> print find(root).tag
{ns}b

>>> find = objectify.ObjectPath("root.{ns}b[1]")
>>> print find(root).tag
{ns}b

Apart from strings, ObjectPath also accepts lists of path segments:

>>> find = objectify.ObjectPath(['root', 'b', 'c'])
>>> print find(root).tag
{ns}c

>>> find = objectify.ObjectPath(['root', '{ns}b[1]'])
>>> print find(root).tag
{ns}b

You can also use relative paths starting with a '.' to ignore the actual root element and only inherit its namespace:

>>> find = objectify.ObjectPath(".b[1]")
>>> print find(root).tag
{ns}b

>>> find = objectify.ObjectPath(['', 'b[1]'])
>>> print find(root).tag
{ns}b

>>> find = objectify.ObjectPath(".unknown[1]")
>>> print find(root).tag
Traceback (most recent call last):
  ...
AttributeError: no such child: {ns}unknown

>>> find = objectify.ObjectPath(".{other}unknown[1]")
>>> print find(root).tag
Traceback (most recent call last):
  ...
AttributeError: no such child: {other}unknown

For convenience, a single dot represents the empty ObjectPath (identity):

>>> find = objectify.ObjectPath(".")
>>> print find(root).tag
{ns}root

ObjectPath objects can be used to manipulate trees:

>>> root = objectify.Element("{ns}root")

>>> path = objectify.ObjectPath(".some.child.{other}unknown")
>>> path.hasattr(root)
False
>>> path.find(root)
Traceback (most recent call last):
  ...
AttributeError: no such child: {ns}some

>>> path.setattr(root, "my value") # creates children as necessary
>>> path.hasattr(root)
True
>>> print path.find(root).text
my value
>>> print root.some.child["{other}unknown"].text
my value

>>> print len( path.find(root) )
1
>>> path.addattr(root, "my new value")
>>> print len( path.find(root) )
2
>>> [ el.text for el in path.find(root) ]
['my value', 'my new value']

As with attribute assignment, setattr() accepts lists:

>>> path.setattr(root, ["v1", "v2", "v3"])
>>> [ el.text for el in path.find(root) ]
['v1', 'v2', 'v3']

Note, however, that indexing is only supported in this context if the children exist. Indexing of non existing children will not extend or create a list of such children but raise an exception:

>>> path = objectify.ObjectPath(".{non}existing[1]")
>>> path.setattr(root, "my value")
Traceback (most recent call last):
  ...
TypeError: creating indexed path attributes is not supported

It is worth noting that ObjectPath does not depend on the objectify module or the ObjectifiedElement implementation. It can also be used in combination with Elements from the normal lxml.etree API.

Python data types

The objectify module knows about Python data types and tries its best to let element content behave like them. For example, they support the normal math operators:

>>> root = objectify.fromstring(
...             "<root><a>5</a><b>11</b><c>true</c><d>hoi</d></root>")
>>> root.a + root.b
16
>>> root.a += root.b
>>> print root.a
16

>>> root.a = 2
>>> print root.a + 2
4
>>> print 1 + root.a
3

>>> print root.c
True
>>> root.c = False
>>> if not root.c:
...     print "false!"
false!

>>> print root.d + " test !"
hoi test !
>>> root.d = "%s - %s"
>>> print root.d % (1234, 12345)
1234 - 12345

However, data elements continue to provide the objectify API. This means that sequence operations such as len(), slicing and indexing (e.g. of strings) cannot behave as the Python types. Like all other tree elements, they show the normal slicing behaviour of objectify elements:

>>> root = objectify.fromstring("<root><a>test</a><b>toast</b></root>")
>>> print root.a + ' me' # behaves like a string, right?
test me
>>> len(root.a) # but there's only one 'a' element!
1
>>> [ a.tag for a in root.a ]
['a']
>>> print root.a[0].tag
a

>>> print root.a
test
>>> [ str(a) for a in root.a[:1] ]
['test']

If you need to run sequence operations on data types, you must ask the API for the real Python value. The string value is always available through the normal ElementTree .text attribute. Additionally, all data classes provide a .pyval attribute that returns the value as plain Python type:

>>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
>>> root.a.text
'test'
>>> root.a.pyval
'test'

>>> root.b.text
'5'
>>> root.b.pyval
5

Note, however, that both attributes are read-only in objectify. If you want to change values, just assign them directly to the attribute:

>>> root.a.text  = "25"
Traceback (most recent call last):
  ...
TypeError: attribute 'text' of 'StringElement' objects is not writable

>>> root.a.pyval = 25
Traceback (most recent call last):
  ...
TypeError: attribute 'pyval' of 'StringElement' objects is not writable

>>> root.a = 25
>>> print root.a
25
>>> print root.a.pyval
25

In other words, objectify data elements behave like immutable Python types. You can replace them, but not modify them.

Recursive tree dump

To see the data types that are currently used, you can call the module level dump() function that returns a recursive string representation for elements:

>>> root = objectify.fromstring("""
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...   <a attr1="foo" attr2="bar">1</a>
...   <a>1.2</a>
...   <b>1</b>
...   <b>true</b>
...   <c>what?</c>
...   <d xsi:nil="true"/>
... </root>
... """)

>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * attr1 = 'foo'
      * attr2 = 'bar'
    a = 1.2 [FloatElement]
    b = 1 [IntElement]
    b = True [BoolElement]
    c = 'what?' [StringElement]
    d = None [NoneElement]
      * xsi:nil = 'true'

You can freely switch between different types for the same child:

>>> root = objectify.fromstring("<root><a>5</a></root>")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 5 [IntElement]

>>> root.a = 'nice string!'
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 'nice string!' [StringElement]
      * py:pytype = 'str'

>>> root.a = True
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = True [BoolElement]
      * py:pytype = 'bool'

>>> root.a = [1, 2, 3]
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * py:pytype = 'int'
    a = 2 [IntElement]
      * py:pytype = 'int'
    a = 3 [IntElement]
      * py:pytype = 'int'

>>> root.a = (1, 2, 3)
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * py:pytype = 'int'
    a = 2 [IntElement]
      * py:pytype = 'int'
    a = 3 [IntElement]
      * py:pytype = 'int'

Recursive string representation of elements

Normally, elements use the standard string representation for str() that is provided by lxml.etree. You can enable a pretty-print representation for objectify elements like this:

>>> objectify.enableRecursiveStr()

>>> root = objectify.fromstring("""
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...   <a attr1="foo" attr2="bar">1</a>
...   <a>1.2</a>
...   <b>1</b>
...   <b>true</b>
...   <c>what?</c>
...   <d xsi:nil="true"/>
... </root>
... """)

>>> print str(root)
root = None [ObjectifiedElement]
    a = 1 [IntElement]
      * attr1 = 'foo'
      * attr2 = 'bar'
    a = 1.2 [FloatElement]
    b = 1 [IntElement]
    b = True [BoolElement]
    c = 'what?' [StringElement]
    d = None [NoneElement]
      * xsi:nil = 'true'

This behaviour can be switched off in the same way:

>>> objectify.enableRecursiveStr(False)

How data types are matched

Objectify uses two different types of Elements. Structural Elements (or tree Elements) represent the object tree structure. Data Elements represent the data containers at the leafs. You can explicitly create tree Elements with the objectify.Element() factory and data Elements with the objectify.DataElement() factory.

When Element objects are created, lxml.objectify must determine which implementation class to use for them. This is relatively easy for tree Elements and less so for data Elements. The algorithm is as follows:

  1. If an element has children, use the default tree class.
  2. If an element is defined as xsi:nil, use the NoneElement class.
  3. If a "Python type hint" attribute is given, use this to determine the element class, see below.
  4. If an XML Schema xsi:type hint is given, use this to determine the element class, see below.
  5. Try to determine the element class from the text content type by trial and error.
  6. If the element is a root node then use the default tree class.
  7. Otherwise, use the default class for empty data classes.

You can change the default classes for tree Elements and empty data Elements at setup time. The ObjectifyElementClassLookup() call accepts two keyword arguments, tree_class and empty_data_class, that determine the Element classes used in these cases. By default, tree_class is a class called ObjectifiedElement and empty_data_class is a StringElement.

Type annotations

The "type hint" mechanism deploys an XML attribute defined as lxml.objectify.PYTYPE_ATTRIBUTE. It may contain any of the following string values: int, long, float, str, unicode, NoneType:

>>> print objectify.PYTYPE_ATTRIBUTE
{http://codespeak.net/lxml/objectify/pytype}pytype
>>> ns, name = objectify.PYTYPE_ATTRIBUTE[1:].split('}')

>>> root = objectify.fromstring("""\
... <root xmlns:py='%s'>
...   <a py:pytype='str'>5</a>
...   <b py:pytype='int'>5</b>
...   <c py:pytype='NoneType' />
... </root>
... """ % ns)

>>> print root.a + 10
510
>>> print root.b + 10
15
>>> print root.c
None

Note that you can change the name and namespace used for this attribute through the set_pytype_attribute_tag(tag) module function, in case your application ever needs to. There is also a utility function annotate() that recursively generates this attribute for the elements of a tree:

>>> root = objectify.fromstring("<root><a>test</a><b>5</b></root>")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 'test' [StringElement]
    b = 5 [IntElement]

>>> objectify.annotate(root)

>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 'test' [StringElement]
      * py:pytype = 'str'
    b = 5 [IntElement]
      * py:pytype = 'int'

XML Schema datatype annotation

A second way of specifying data type information uses XML Schema types as element annotations. Objectify knows those that can be mapped to normal Python types:

>>> root = objectify.fromstring('''\
...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...          xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...      <d xsi:type="xsd:double">5</d>
...      <l xsi:type="xsd:long"  >5</l>
...      <s xsi:type="xsd:string">5</s>
...    </root>
...    ''')
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    d = 5.0 [FloatElement]
      * xsi:type = 'xsd:double'
    l = 5L [LongElement]
      * xsi:type = 'xsd:long'
    s = '5' [StringElement]
      * xsi:type = 'xsd:string'

Again, there is a utility function xsiannotate() that recursively generates the "xsi:type" attribute for the elements of a tree:

>>> root = objectify.fromstring('''\
...    <root><a>test</a><b>5</b><c>true</c></root>
...    ''')
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 'test' [StringElement]
    b = 5 [IntElement]
    c = True [BoolElement]

>>> objectify.xsiannotate(root)

>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    a = 'test' [StringElement]
      * xsi:type = 'xsd:string'
    b = 5 [IntElement]
      * xsi:type = 'xsd:int'
    c = True [BoolElement]
      * xsi:type = 'xsd:boolean'

Note, however, that xsiannotate() will always use the first XML Schema datatype that is defined for any given Python type, see also Defining additional data classes.

The utility function deannotate() can be used to get rid of 'py:pytype' and/or 'xsi:type' information:

>>> root = objectify.fromstring('''\
... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...       xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...   <d xsi:type="xsd:double">5</d>
...   <l xsi:type="xsd:long"  >5</l>
...   <s xsi:type="xsd:string">5</s>
... </root>''')
>>> objectify.annotate(root)
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    d = 5.0 [FloatElement]
      * xsi:type = 'xsd:double'
      * py:pytype = 'float'
    l = 5L [LongElement]
      * xsi:type = 'xsd:long'
      * py:pytype = 'long'
    s = '5' [StringElement]
      * xsi:type = 'xsd:string'
      * py:pytype = 'str'
>>> objectify.deannotate(root)
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    d = 5 [IntElement]
    l = 5 [IntElement]
    s = 5 [IntElement]

The DataElement factory

For convenience, the DataElement() factory creates an Element with a Python value in one step. You can pass the required Python type name or the XSI type name:

>>> root = objectify.Element("root")
>>> root.x = objectify.DataElement(5, _pytype="long")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    x = 5L [LongElement]
      * py:pytype = 'long'

>>> root.x = objectify.DataElement(5, _pytype="str", myattr="someval")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    x = '5' [StringElement]
      * py:pytype = 'str'
      * myattr = 'someval'

>>> root.x = objectify.DataElement(5, _xsi="integer")
>>> print objectify.dump(root)
root = None [ObjectifiedElement]
    x = 5L [LongElement]
      * py:pytype = 'long'
      * xsi:type = 'xsd:integer'

XML Schema types reside in the XML schema namespace thus DataElement() tries to correctly prefix the xsi:type attribute value for you:

>>> root = objectify.Element("root")
>>> root.s = objectify.DataElement(5, _xsi="string")

>>> objectify.deannotate(root, xsi=False)
>>> print etree.tostring(root, pretty_print=True)
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <s xsi:type="xsd:string">5</s>
</root>

DataElement() uses a default nsmap to set these prefixes:

>>> el = objectify.DataElement('5', _xsi='string')
>>> for prefix, namespace in el.nsmap.items():
...     print prefix, '-', namespace
py - http://codespeak.net/lxml/objectify/pytype
xsd - http://www.w3.org/2001/XMLSchema
xsi - http://www.w3.org/2001/XMLSchema-instance

>>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
xsd:string

While you can set custom namespace prefixes, it is necessary to provide valid namespace information if you choose to do so:

>>> el = objectify.DataElement('5', _xsi='foo:string',
...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema'})
>>> for prefix, namespace in el.nsmap.items():
...     print prefix, '-', namespace
py - http://codespeak.net/lxml/objectify/pytype
foo - http://www.w3.org/2001/XMLSchema
xsi - http://www.w3.org/2001/XMLSchema-instance

>>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
foo:string

Note how lxml chose a default prefix for the XML Schema Instance namespace. We can override it as in the following example:

>>> el = objectify.DataElement('5', _xsi='foo:string',
...          nsmap={'foo': 'http://www.w3.org/2001/XMLSchema',
...                 'myxsi': 'http://www.w3.org/2001/XMLSchema-instance'})
>>> for prefix, namespace in el.nsmap.items():
...     print prefix, '-', namespace
py - http://codespeak.net/lxml/objectify/pytype
foo - http://www.w3.org/2001/XMLSchema
myxsi - http://www.w3.org/2001/XMLSchema-instance

>>> print el.get("{http://www.w3.org/2001/XMLSchema-instance}type")
foo:string

Care must be taken if different namespace prefixes have been used for the same namespace. Namespace information gets merged to avoid duplicate definitions when adding a new sub-element to a tree, but this mechanism does not adapt the prefixes of attribute values:

>>> root = objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")
>>> print etree.tostring(root, pretty_print=True)
<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>

>>> s = objectify.DataElement("17", _xsi="string")
>>> print etree.tostring(s, pretty_print=True)
<value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>

>>> root.s = s
>>> print etree.tostring(root, pretty_print=True)
<root xmlns:schema="http://www.w3.org/2001/XMLSchema">
  <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s>
</root>

It is your responsibility to fix the prefixes of attribute values if you choose to deviate from the standard prefixes. A convenient way to do this for xsi:type attributes is to use the xsiannotate() utility:

>>> objectify.xsiannotate(root)
>>> print etree.tostring(root, pretty_print=True)
<root xmlns:schema="http://www.w3.org/2001/XMLSchema">
  <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s>
</root>

Of course, it is discouraged to use different prefixes for one and the same namespace when building up an objectify tree.

Defining additional data classes

You can plug additional data classes into objectify that will be used in exactly the same way as the predefined types. Data classes can either inherit from ObjectifiedDataElement directly or from one of the specialised classes like NumberElement or BoolElement. The numeric types require an initial call to the NumberElement method self._setValueParser(function) to set their type conversion function (string -> numeric Python type). This call should be placed into the element _init() method.

The registration of data classes uses the PyType class:

>>> class ChristmasDate(objectify.ObjectifiedDataElement):
...     def call_santa(self):
...         print "Ho ho ho!"

>>> def checkChristmasDate(date_string):
...     if not date_string.startswith('24.12.'):
...         raise ValueError # or TypeError

>>> xmas_type = objectify.PyType('date', checkChristmasDate, ChristmasDate)

The PyType constructor takes a string type name, an (optional) callable type check and the custom data class. If a type check is provided it must accept a string as argument and raise ValueError or TypeError if it cannot handle the string value.

PyTypes are used if an element carries a py:pytype attribute denoting its data type or, in absence of such an attribute, if the given type check callable does not raise a ValueError/TypeError exception when applied to the element text.

If you want, you can also register this type under an XML Schema type name:

>>> xmas_type.xmlSchemaTypes = ("date",)

XML Schema types will be considered if the element has an xsi:type attribute that specifies its data type. The line above binds the XSD type date to the newly defined Python type. Note that this must be done before the next step, which is to register the type. Then you can use it:

>>> xmas_type.register()

>>> root = objectify.fromstring(
...             "<root><a>24.12.2000</a><b>12.24.2000</b></root>")
>>> root.a.call_santa()
Ho ho ho!
>>> root.b.call_santa()
Traceback (most recent call last):
  ...
AttributeError: no such child: call_santa

If you need to specify dependencies between the type check functions, you can pass a sequence of type names through the before and after keyword arguments of the register() method. The PyType will then try to register itself before or after the respective types, as long as they are currently registered. Note that this only impacts the currently registered types at the time of registration. Types that are registered later on will not care about the dependencies of already registered types.

If you provide XML Schema type information, this will override the type check function defined above:

>>> root = objectify.fromstring('''\
...    <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...      <a xsi:type="date">12.24.2000</a>
...    </root>
...    ''')
>>> print root.a
12.24.2000
>>> root.a.call_santa()
Ho ho ho!

To unregister a type, call its unregister() method:

>>> root.a.call_santa()
Ho ho ho!
>>> xmas_type.unregister()
>>> root.a.call_santa()
Traceback (most recent call last):
  ...
AttributeError: no such child: call_santa

Be aware, though, that this does not immediately apply to elements to which there already is a Python reference. Their Python class will only be changed after all references are gone and the Python object is garbage collected.

Advanced element class lookup

In some cases, the normal data class setup is not enough. Being based on lxml.etree, however, lxml.objectify supports very fine-grained control over the Element classes used in a tree. All you have to do is configure a different class lookup mechanism (or write one yourself).

The first step for the setup is to create a new parser that builds objectify documents. The objectify API is meant for data-centric XML (as opposed to document XML with mixed content). Therefore, we configure the parser to let it remove whitespace-only text from the parsed document if it is not enclosed by an XML element. Note that this alters the document infoset, so if you consider the removed spaces as data in your specific use case, you should go with a normal parser and just set the element class lookup. Most applications, however, will work fine with the following setup:

>>> parser = objectify.makeparser(remove_blank_text=True)

What this does internally, is:

>>> parser = etree.XMLParser(remove_blank_text=True)

>>> lookup = objectify.ObjectifyElementClassLookup()
>>> parser.set_element_class_lookup(lookup)

If you want to change the lookup scheme, say, to get additional support for namespace specific classes, you can register the objectify lookup as a fallback of the namespace lookup. In this case, however, you have to take care that the namespace classes inherit from objectify.ObjectifiedElement, not only from the normal lxml.etree.ElementBase, so that they support the objectify API. The above setup code then becomes:

>>> lookup = etree.ElementNamespaceClassLookup(
...                   objectify.ObjectifyElementClassLookup() )
>>> parser.set_element_class_lookup(lookup)

See the documentation on class lookup schemes for more information.

What is different from lxml.etree?

Such a different Element API obviously implies some side effects to the normal behaviour of the rest of the API.