lxml allows you to implement namespaces, in a rather literal sense. You can write your own classes for Elements and have lxml use them for a specific tag name in a specific namespace.
Custom Elements must inherit from the etree.ElementBase class, which provides the Element interface for subclasses:
>>> from lxml import etree >>> class HonkElement(etree.ElementBase): ... def honking(self): ... return self.get('honking') == 'true' ... honking = property(honking)
This defines a new Element class HonkElement with a property honking.
Note that you cannot (or rather must not) instantiate this class yourself. lxml.etree will do that for you through its normal ElementTree API. To let lxml know about it, you must register it with a namespace.
You can build a new namespace (or retrieve an existing one) by calling the Namespace class:
>>> namespace = etree.Namespace('http://hui.de/honk')
and then register the new element type with that namespace, say, under the tag name honk:
>>> namespace['honk'] = HonkElement
After this, you create and use your XML elements through the normal API of lxml:
>>> xml = '<honk xmlns="http://hui.de/honk" honking="true"/>' >>> honk_element = etree.XML(xml) >>> print honk_element.honking True
The same works when creating elements by hand:
>>> honk_element = etree.Element('{http://hui.de/honk}honk', ... honking='true') >>> print honk_element.honking True
Essentially, what this allows you to do, is to give elements a custom API based on their namespace and tag name.
A somewhat related topic are extension functions which use a similar mechanism for registering extension functions in XPath and XSLT.
There is one thing to remember. Element classes must not have a constructor, neither must there be any internal state (except for the data stored in the underlying XML tree). Element instances are created and garbage collected at need, so there is no way to predict when and how often a constructor would be called. Even worse, when the __init__ method is called, the object may not even be initialized yet to represent the XML tag, so there is not much use in providing an __init__ method in subclasses.
However, there is one possible way to do things on element initialization, if you really need to. ElementBase classes have an _init() method that can be overridden. It can be used to modify the XML tree, e.g. to construct special children or verify and update attributes.
The semantics of _init() are as follows:
In the Namespace example above, we associated the HonkElement class only with the 'honk' element. If an XML tree contains different elements in the same namespace, they do not pick up the same implementation:
>>> xml = '<honk xmlns="http://hui.de/honk" honking="true"><bla/></honk>' >>> honk_element = etree.XML(xml) >>> print honk_element.honking True >>> print honk_element[0].honking Traceback (most recent call last): ... AttributeError: 'etree._Element' object has no attribute 'honking'
You can therefore provide one implementation per element name in each namespace and have lxml select the right one on the fly. If you want one element implementation per namespace (ignoring the element name) or prefer having a common class for most elements except a few, you can specify a default implementation for an entire namespace by registering that class with the empty element name (None).
You may consider following an object oriented approach here. If you build a class hierarchy of element classes, you can also implement a base class for a namespace that is used if no specific element class is provided. Again, you can just pass None as an element name:
>>> class HonkNSElement(etree.ElementBase): ... def honk(self): ... return "HONK" >>> namespace[None] = HonkNSElement >>> class HonkElement(HonkNSElement): ... def honking(self): ... return self.get('honking') == 'true' ... honking = property(honking) >>> namespace['honk'] = HonkElement
Now you can rely on lxml to always return objects of type HonkNSElement or its subclasses for elements of this namespace:
>>> xml = '<honk xmlns="http://hui.de/honk" honking="true"><bla/></honk>' >>> honk_element = etree.XML(xml) >>> print type(honk_element), type(honk_element[0]) <class 'HonkElement'> <class 'HonkNSElement'> >>> print honk_element.honking True >>> print honk_element.honk() HONK >>> print honk_element[0].honk() HONK >>> print honk_element[0].honking Traceback (most recent call last): ... AttributeError: 'HonkNSElement' object has no attribute 'honking'