The public C-API of lxml.etree

As of version 1.1, lxml.etree provides a public C-API. This allows external C extensions to efficiently access public functions and classes of lxml, without going through the Python API.

The API is described in the file etreepublic.pxd, which is directly c-importable by extension modules implemented in Cython.

Contents

Passing generated trees through Python

This is the most simple way to integrate with lxml. It does not require any C-level integration but uses a Python function to wrap an externally generated libxml2 document in lxml.

The external module that creates the libxml2 tree must pack the document pointer into a PyCapsule object. This can then be passed into lxml with the function lxml.etree.adopt_external_document(). It also takes an optional lxml parser instance to associate with the document, in order to configure the Element class lookup, relative URL lookups, etc.

See the API reference for further details.

The same functionality is available as part of the public C-API in form of the C function adoptExternalDocument().

Writing external modules in Cython

This is the easiest way of extending lxml at the C level. A Cython module should start like this:

# My Cython extension

# directive pointing compiler to lxml header files;
# use ``aliases={"LXML_PACKAGE_DIR": lxml.__path__}``
# argument to cythonize in setup.py to dynamically
# determine dir at compile time
# distutils: include_dirs = LXML_PACKAGE_DIR

# import the public functions and classes of lxml.etree
cimport lxml.includes.etreepublic as cetree

# import the lxml.etree module in Python
cdef object etree
from lxml import etree

# initialize the access to the C-API of lxml.etree
cetree.import_lxml__etree()

From this line on, you can access all public functions of lxml.etree from the cetree namespace like this:

# build a tag name from namespace and element name
py_tag = cetree.namespacedNameFromNsName("http://some/url", "myelement")

Public lxml classes are easily subclassed. For example, to implement and set a new default element class, you can write Cython code like the following:

from lxml.includes.etreepublic cimport ElementBase
cdef class NewElementClass(ElementBase):
     def set_value(self, myval):
         self.set("my_attribute", myval)

etree.set_element_class_lookup(
     etree.ElementDefaultClassLookup(element=NewElementClass))

Writing external modules in C

If you really feel like it, you can also interface with lxml.etree straight from C code. All you have to do is include the header file for the public API, import the lxml.etree module and then call the import function:

/* My C extension */

/* common includes */
#include "Python.h"
#include "stdio.h"
#include "string.h"
#include "stdarg.h"
#include "libxml/xmlversion.h"
#include "libxml/encoding.h"
#include "libxml/hash.h"
#include "libxml/tree.h"
#include "libxml/xmlIO.h"
#include "libxml/xmlsave.h"
#include "libxml/globals.h"
#include "libxml/xmlstring.h"

/* lxml.etree specific includes */
#include "lxml-version.h"
#include "etree_defs.h"
#include "etree.h"

/* setup code */
import_lxml__etree()

Note that including etree.h does not automatically include the header files it requires. Note also that the above list of common includes may not be sufficient.