Package lxml :: Module etree
[hide private]
[frames] | no frames]

Module etree

The lxml.etree module implements the extended ElementTree API for XML.

Version: 3.8.0

Classes [hide private]
AncestorsIterator(self, node, tag=None) Iterates over the ancestors of an element (from parent to parent).
AttributeBasedElementClassLookup(self, attribute_name, class_mapping, fallback=None) Checks an attribute of an Element and looks up the value in a class dictionary.
Error during C14N serialisation.
All custom Comment classes must inherit from this one.
CustomElementClassLookup(self, fallback=None) Element class lookup based on a subclass method.
DTD(self, file=None, external_id=None) A DTD validator.
Base class for DTD errors.
Error while parsing a DTD.
Error while validating an XML document with a DTD.
Document information provided by parser and DTD.
Validation error.
ETCompatXMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema=None, huge_tree=False, remove_blank_text=False, resolve_entities=True, remove_comments=True, remove_pis=True, strip_cdata=True, target=None, compact=True)
ETXPath(self, path, extensions=None, regexp=True, smart_strings=True) Special XPath class that supports the ElementTree {uri} notation for namespaces.
ElementBase(*children, attrib=None, nsmap=None, **_extra)
ElementChildIterator(self, node, tag=None, reversed=False) Iterates over the children of an element.
ElementClassLookup(self) Superclass of Element class lookups.
ElementDefaultClassLookup(self, element=None, comment=None, pi=None, entity=None) Element class lookup scheme that always returns the default Element class.
ElementDepthFirstIterator(self, node, tag=None, inclusive=True) Iterates over an element and its sub-elements in document order (depth first pre-order).
ElementNamespaceClassLookup(self, fallback=None)
ElementTextIterator(self, element, tag=None, with_tail=True) Iterates over the text content of a subtree.
All custom Entity classes must inherit from this one.
Libxml2 error domains
Libxml2 error levels
Libxml2 error types
FallbackElementClassLookup(self, fallback=None)
HTMLParser(self, encoding=None, remove_blank_text=False, remove_comments=False, remove_pis=False, strip_cdata=True, no_network=True, target=None, schema: XMLSchema =None, recover=True, compact=True, collect_ids=True)
HTMLPullParser(self, events=None, *, tag=None, base_url=None, **kwargs)
Main exception base class for lxml. All other exceptions inherit from this one.
Base class of lxml registry errors.
Base class for all syntax errors.
Error registering a namespace extension.
All custom Processing Instruction classes must inherit from this one.
Syntax error while parsing an XML document.
ParserBasedElementClassLookup(self, fallback=None) Element class lookup based on the XML parser.
Internal lxml parser error.
PyErrorLog(self, logger_name=None, logger=None) A global error log that connects to the Python stdlib logging package.
PythonElementClassLookup(self, fallback=None) Element class lookup based on a subclass method.
QName(text_or_uri_or_element, tag=None)
RelaxNG(self, etree=None, file=None) Turn a document into a Relax NG validator.
Base class for RelaxNG errors.
Libxml2 RelaxNG error types
Error while parsing an XML document as RelaxNG.
Error while validating an XML document with a RelaxNG schema.
This is the base class of all resolvers.
Schematron(self, etree=None, file=None) A Schematron validator.
Base class of all Schematron errors.
Error while parsing an XML document as Schematron schema.
Error while validating an XML document with a Schematron schema.
A libxml2 error that occurred during serialisation.
SiblingsIterator(self, node, tag=None, preceding=False) Iterates over the siblings of an element.
TreeBuilder(self, element_factory=None, parser=None) Parser target that builds a tree.
XInclude(self) XInclude processor.
Error during XInclude processing.
XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema: XMLSchema =None, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True)
XMLPullParser(self, events=None, *, tag=None, **kwargs)
XMLSchema(self, etree=None, file=None) Turn a document into an XML Schema validator.
Base class of all XML Schema errors
Error while parsing an XML document as XML Schema.
Error while validating an XML document with an XML Schema.
Syntax error while parsing an XML document.
ETCompatXMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema=None, huge_tree=False, remove_blank_text=False, resolve_entities=True, remove_comments=True, remove_pis=True, strip_cdata=True, target=None, compact=True)
XPath(self, path, namespaces=None, extensions=None, regexp=True, smart_strings=True) A compiled XPath expression that can be called on Elements and ElementTrees.
XPathDocumentEvaluator(self, etree, namespaces=None, extensions=None, regexp=True, smart_strings=True) Create an XPath evaluator for an ElementTree.
XPathElementEvaluator(self, element, namespaces=None, extensions=None, regexp=True, smart_strings=True) Create an XPath evaluator for an element.
Base class of all XPath errors.
Error during XPath evaluation.
Internal error looking up an XPath extension function.
Error handling an XPath result.
XSLT(self, xslt_input, extensions=None, regexp=True, access_control=None)
XSLTAccessControl(self, read_file=True, write_file=True, create_dir=True, read_network=True, write_network=True)
Error running an XSL transformation.
Base class of all XSLT errors.
Base class of an XSLT extension element.
Error registering an XSLT extension.
Error parsing a stylesheet document.
Error serialising an XSLT result.
A dict-like proxy for the Element.attrib property.
Internal base class to reference a libxml document.
Element class.
Dead but public. :)
Dead but public. :)
IDDict(self, etree) A dictionary-like proxy class that mapps ID attributes to elements.
Immutable base version of a list based error log.
A log message entry from an error log.
Base class for XML validators.
htmlfile(self, output_file, encoding=None, compression=None, close=False, buffered=True)
iterparse(self, source, events=("end",), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, recover=None, huge_tree=False, schema=None)
iterwalk(self, element_or_tree, events=("end",), tag=None)
xmlfile(self, output_file, encoding=None, compression=None, close=False, buffered=True)
Functions [hide private]
Comment element factory. This factory function creates a special element that will be serialized as an XML comment.
Element(_tag, attrib=None, nsmap=None, **_extra)
Element factory. This function returns an object implementing the Element interface.
ElementTree(element=None, file=None, parser=None)
ElementTree wrapper class.
Entity factory. This factory function creates a special element that will be serialized as an XML entity reference or character reference. Note, however, that entities will not be automatically declared in the document. A document that uses entity references requires a DTD to define the entities.
Extension(module, function_mapping=None, ns=None)
Build a dictionary of extension functions from the functions defined in a module or the methods of an object.
Retrieve the function namespace object associated with the given URI.
HTML(text, parser=None, base_url=None)
Parses an HTML document from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed "HTML literals" in Python code.
PI(target, text=None)
ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.
ProcessingInstruction(target, text=None)
ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.
SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra)
Subelement factory. This function creates an element instance, and appends it to an existing element.
XML(text, parser=None, base_url=None)
Parses an XML document or fragment from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed "XML literals" in Python code, like in
XMLDTDID(text, parser=None, base_url=None)
Parse the text and return a tuple (root node, ID dictionary). The root node is the same as returned by the XML() function. The dictionary contains string-element pairs. The dictionary keys are the values of ID attributes as defined by the DTD. The elements referenced by the ID are stored as dictionary values.
XMLID(text, parser=None, base_url=None)
Parse the text and return a tuple (root node, ID dictionary). The root node is the same as returned by the XML() function. The dictionary contains string-element pairs. The dictionary keys are the values of 'id' attributes. The elements referenced by the ID are stored as dictionary values.
XPathEvaluator(etree_or_element, namespaces=None, extensions=None, regexp=True, smart_strings=True)
Creates an XPath evaluator for an ElementTree or an Element.
adopt_external_document(capsule, parser=None)
Unpack a libxml2 document pointer from a PyCapsule and wrap it in an lxml ElementTree object.
cleanup_namespaces(tree_or_element, top_nsmap=None, keep_ns_prefixes=None)
Remove all namespace declarations from a subtree that are not used by any of the elements or attributes in that tree.
Clear the global error log. Note that this log is already bound to a fixed size.
dump(elem, pretty_print=True, with_tail=True)
Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.
fromstring(text, parser=None, base_url=None)
Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).
fromstringlist(strings, parser=None)
Parses an XML document from a sequence of strings. Returns the root node (or the result returned by a parser target).
Checks if an object appears to be a valid element object.
parse(source, parser=None, base_url=None)
Return an ElementTree object loaded with source elements. If no parser is provided as second argument, the default parser is used.
parseid(source, parser=None)
Parses the source into a tuple containing an ElementTree object and an ID dictionary. If no parser is provided as second argument, the default parser is used.
Registers a namespace prefix that newly created Elements in that namespace will use. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed.
Set a default parser for the current thread. This parser is used globally whenever no parser is supplied to the various parse functions of the lxml API. If this function is called without a parser (or if it is None), the default parser is reset to the original configuration.
set_element_class_lookup(lookup= None)
Set the global default element class lookup method.
strip_attributes(tree_or_element, *attribute_names)
Delete all attributes with the provided attribute names from an Element (or ElementTree) and its descendants.
strip_elements(tree_or_element, with_tail=True, *tag_names)
Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail keyword argument option to False.
strip_tags(tree_or_element, *tag_names)
Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their attributes, but not their text/tail content or descendants. Instead, it will merge the text content and children of the element into its parent.
tostring(element_or_tree, encoding=None, method="xml", xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, with_comments=True, inclusive_ns_prefixes=None)
Serialize an element to an encoded string representation of its XML tree.
tostringlist(element_or_tree, *args, **kwargs)
Serialize an element to an encoded string representation of its XML tree, stored in a list of partial strings.
tounicode(element_or_tree, method="xml", pretty_print=False, with_tail=True, doctype=None)
Serialize an element to the Python unicode representation of its XML tree.
Replace the global error log by an etree.PyErrorLog that uses the standard Python logging package.
Variables [hide private]
  DEBUG = 1
  LIBXML_VERSION = (2, 9, 3)
  LIBXSLT_VERSION = (1, 1, 28)
  LXML_VERSION = (3, 8, 0, 0)
  __package__ = None
  __pyx_capi__ = {'adoptExternalDocument': <capsule object "stru...
  __test__ = {u'XML (line 3182)': u'XML(text, parser=None, base_...
  memory_debugger = <lxml.etree._MemDebug object>
Function Details [hide private]

Element(_tag, attrib=None, nsmap=None, **_extra)


Element factory. This function returns an object implementing the Element interface.

Also look at the _Element.makeelement() and _BaseParser.makeelement() methods, which provide a faster way to create an Element within a specific document or parser context.

Extension(module, function_mapping=None, ns=None)


Build a dictionary of extension functions from the functions defined in a module or the methods of an object.

As second argument, you can pass an additional mapping of attribute names to XPath function names, or a list of function names that should be taken.

The ns keyword argument accepts a namespace URI for the XPath functions.



Retrieve the function namespace object associated with the given URI.

Creates a new one if it does not yet exist. A function namespace can only be used to register extension functions.

HTML(text, parser=None, base_url=None)


Parses an HTML document from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed "HTML literals" in Python code.

To override the parser with a different HTMLParser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, ...).

XML(text, parser=None, base_url=None)


Parses an XML document or fragment from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed "XML literals" in Python code, like in

>>> root = XML("<root><test/></root>")
>>> print(root.tag)

To override the parser with a different XMLParser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, ...).

XMLDTDID(text, parser=None, base_url=None)


Parse the text and return a tuple (root node, ID dictionary). The root node is the same as returned by the XML() function. The dictionary contains string-element pairs. The dictionary keys are the values of ID attributes as defined by the DTD. The elements referenced by the ID are stored as dictionary values.

Note that you must not modify the XML tree if you use the ID dictionary. The results are undefined.

XPathEvaluator(etree_or_element, namespaces=None, extensions=None, regexp=True, smart_strings=True)


Creates an XPath evaluator for an ElementTree or an Element.

The resulting object can be called with an XPath expression as argument and XPath variables provided as keyword arguments.

Additional namespace declarations can be passed with the 'namespace' keyword argument. EXSLT regular expression support can be disabled with the 'regexp' boolean keyword (defaults to True). Smart strings will be returned for string results unless you pass smart_strings=False.

adopt_external_document(capsule, parser=None)


Unpack a libxml2 document pointer from a PyCapsule and wrap it in an lxml ElementTree object.

This allows external libraries to build XML/HTML trees using libxml2 and then pass them efficiently into lxml for further processing. Requires Python 2.7 or later.

If a parser is provided, it will be used for configuring the lxml document. No parsing will be done.

The capsule must have the name "libxml2:xmlDoc" and its pointer value must reference a correct libxml2 document of type xmlDoc*. The creator of the capsule must take care to correctly clean up the document using an appropriate capsule destructor. By default, the libxml2 document will be copied to let lxml safely own the memory of the internal tree that it uses.

If the capsule context is non-NULL, it must point to a C string that can be compared using strcmp(). If the context string equals "destructor:xmlFreeDoc", the libxml2 document will not be copied but the capsule invalidated instead by clearing its destructor and name. That way, lxml takes ownership of the libxml2 document in memory without creating a copy first, and the capsule destructor will not be called. The document will then eventually be cleaned up by lxml using the libxml2 API function xmlFreeDoc() once it is no longer used.

If no copy is made, later modifications of the tree outside of lxml should not be attempted after transferring the ownership.

cleanup_namespaces(tree_or_element, top_nsmap=None, keep_ns_prefixes=None)


Remove all namespace declarations from a subtree that are not used by any of the elements or attributes in that tree.

If a 'top_nsmap' is provided, it must be a mapping from prefixes to namespace URIs. These namespaces will be declared on the top element of the subtree before running the cleanup, which allows moving namespace declarations to the top of the tree.

If a 'keep_ns_prefixes' is provided, it must be a list of prefixes. These prefixes will not be removed as part of the cleanup.



Clear the global error log. Note that this log is already bound to a fixed size.

Note: since lxml 2.2, the global error log is local to a thread and this function will only clear the global error log of the current thread.

fromstring(text, parser=None, base_url=None)


Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, ...).

fromstringlist(strings, parser=None)


Parses an XML document from a sequence of strings. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

parse(source, parser=None, base_url=None)


Return an ElementTree object loaded with source elements. If no parser is provided as second argument, the default parser is used.

The source can be any of the following:

  • a file name/path
  • a file object
  • a file-like object
  • a URL using the HTTP or FTP protocol

To parse from a string, use the fromstring() function instead.

Note that it is generally faster to parse from a file path or URL than from an open file object or file-like object. Transparent decompression from gzip compressed sources is supported (unless explicitly disabled in libxml2).

The base_url keyword allows setting a URL for the document when parsing from a file-like object. This is needed when looking up external entities (DTD, XInclude, ...) with relative paths.

parseid(source, parser=None)


Parses the source into a tuple containing an ElementTree object and an ID dictionary. If no parser is provided as second argument, the default parser is used.

Note that you must not modify the XML tree if you use the ID dictionary. The results are undefined.



Set a default parser for the current thread. This parser is used globally whenever no parser is supplied to the various parse functions of the lxml API. If this function is called without a parser (or if it is None), the default parser is reset to the original configuration.

Note that the pre-installed default parser is not thread-safe. Avoid the default parser in multi-threaded environments. You can create a separate parser for each thread explicitly or use a parser pool.

strip_attributes(tree_or_element, *attribute_names)


Delete all attributes with the provided attribute names from an Element (or ElementTree) and its descendants.

Attribute names can contain wildcards as in _Element.iter.

Example usage:


strip_elements(tree_or_element, with_tail=True, *tag_names)


Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail keyword argument option to False.

Tag names can contain wildcards as in _Element.iter.

Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants. If you want to include the root element, check its tag name directly before even calling this function.

Example usage:

    'simpletagname',             # non-namespaced tag
    '{http://some/ns}tagname',   # namespaced tag
    '{http://some/other/ns}*'    # any tag from a namespace
    lxml.etree.Comment           # comments

strip_tags(tree_or_element, *tag_names)


Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their attributes, but not their text/tail content or descendants. Instead, it will merge the text content and children of the element into its parent.

Tag names can contain wildcards as in _Element.iter.

Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants.

Example usage:

    'simpletagname',             # non-namespaced tag
    '{http://some/ns}tagname',   # namespaced tag
    '{http://some/other/ns}*'    # any tag from a namespace
    Comment                      # comments (including their text!)

tostring(element_or_tree, encoding=None, method="xml", xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, with_comments=True, inclusive_ns_prefixes=None)


Serialize an element to an encoded string representation of its XML tree.

Defaults to ASCII encoding without XML declaration. This behaviour can be configured with the keyword arguments 'encoding' (string) and 'xml_declaration' (bool). Note that changing the encoding to a non UTF-8 compatible encoding will enable a declaration by default.

You can also serialise to a Unicode string without declaration by passing the unicode function as encoding (or str in Py3), or the name 'unicode'. This changes the return value from a byte string to an unencoded unicode string.

The keyword argument 'pretty_print' (bool) enables formatted XML.

The keyword argument 'method' selects the output method: 'xml', 'html', plain 'text' (text content without tags) or 'c14n'. Default is 'xml'.

The exclusive and with_comments arguments are only used with C14N output, where they request exclusive and uncommented C14N serialisation respectively.

Passing a boolean value to the standalone option will output an XML declaration with the corresponding standalone flag.

The doctype option allows passing in a plain string that will be serialised before the XML tree. Note that passing in non well-formed content here will make the XML output non well-formed. Also, an existing doctype in the document tree will not be removed when serialising an ElementTree instance.

You can prevent the tail text of the element from being serialised by passing the boolean with_tail option. This has no impact on the tail text of children, which will always be serialised.

tostringlist(element_or_tree, *args, **kwargs)


Serialize an element to an encoded string representation of its XML tree, stored in a list of partial strings.

This is purely for ElementTree 1.3 compatibility. The result is a single string wrapped in a list.

tounicode(element_or_tree, method="xml", pretty_print=False, with_tail=True, doctype=None)


Serialize an element to the Python unicode representation of its XML tree.

Note that the result does not carry an XML encoding declaration and is therefore not necessarily suited for serialization to byte streams without further treatment.

The boolean keyword argument 'pretty_print' enables formatted XML.

The keyword argument 'method' selects the output method: 'xml', 'html' or plain 'text'.

You can prevent the tail text of the element from being serialised by passing the boolean with_tail option. This has no impact on the tail text of children, which will always be serialised.

Deprecated: use tostring(el, encoding='unicode') instead.



Replace the global error log by an etree.PyErrorLog that uses the standard Python logging package.

Note that this disables access to the global error log from exceptions. Parsers, XSLT etc. will continue to provide their normal local error log.

Note: prior to lxml 2.2, this changed the error log globally. Since lxml 2.2, the global error log is local to a thread and this function will only set the global error log of the current thread.

Variables Details [hide private]


{'adoptExternalDocument': <capsule object "struct LxmlElementTree *(xm\
lDoc *, PyObject *, int)" at 0x7f1f2fb4f540>,
 'appendChild': <capsule object "void (struct LxmlElement *, struct Lx\
mlElement *)" at 0x7f1f2fb4fd20>,
 'appendChildToElement': <capsule object "int (struct LxmlElement *, s\
truct LxmlElement *)" at 0x7f1f2fb4fd50>,
 'attributeValue': <capsule object "PyObject *(xmlNode *, xmlAttr *)" \
at 0x7f1f2fb4f9f0>,


{u'XML (line 3182)': u'''XML(text, parser=None, base_url=None)

    Parses an XML document or fragment from a string constant.
    Returns the root node (or the result returned by a parser target).
    This function can be used to embed "XML literals" in Python code,
    like in

       >>> root = XML("<root><test/></root>")