To build lxml from source, you need libxml2 and libxslt properly installed, including the header files. These are likely shipped in separate -dev or -devel packages like libxml2-dev, which you need to install. The build process also requires setuptools. The lxml source distribution comes with a script called ez_setup.py that can be used to install them.
The lxml.etree and lxml.objectify modules are written in Cython. Since we distribute the Cython-generated .c files with lxml releases, however, you do not need Cython to build lxml from the normal release sources. We even encourage you to not install Cython for a normal release build, as the generated C code can vary quite heavily between Cython versions, which may or may not generate correct code for lxml. The pre-generated release sources were tested and therefore are known to work.
So, if you want a reliable build of lxml, we suggest to a) use a source release of lxml and b) disable or uninstall Cython for the build.
Only if you are interested in building lxml from a Subversion checkout (e.g. to test a bug fix that has not been release yet) or if want to be an lxml developer, then you do need a working Cython installation. You can use EasyInstall to install it:
lxml currently requires Cython 0.10.3, later release versions should work as well.
The lxml package is developed in a Subversion repository. You can retrieve the current developer version by calling:
svn co https://github.com/lxml/lxml/tree/master/trunk lxml
This will create a directory lxml and download the source into it. You can also browse the Subversion repository through the web, use your favourite SVN client to access it, or browse the Subversion history.
Usually, building lxml is done through setuptools. Do a Subversion checkout (or download the source tar-ball and unpack it) and then type:
python setup.py build
python setup.py bdist_egg
If you want to test lxml from the source directory, it is better to build it in-place like this:
python setup.py build_ext -i
or, in Unix-like environments:
If you get errors about missing header files (e.g. libxml/xmlversion.h) then you need to make sure the development packages of both libxml2 and libxslt are properly installed. Try passing the following option to setup.py to make sure the right config is found:
python setup.py build --with-xslt-config=/path/to/xslt-config
If this doesn't help, you may have to add the location of the header files to the include path like:
python setup.py build_ext -i -I /usr/include/libxml2
where the file is in /usr/include/libxml2/libxml/xmlversion.h
To use lxml.etree in-place, you can place lxml's src directory on your Python module search path (PYTHONPATH) and then import lxml.etree to play with it:
# cd lxml # PYTHONPATH=src python Python 2.5.1 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>>
To recompile after changes, note that you may have to run make clean or delete the file src/lxml/etree.c. Distutils do not automatically pick up changes that affect files other than the main file src/lxml/etree.pyx.
The source distribution (tgz) and the Subversion repository contain a test suite for lxml. You can run it from the top-level directory:
Note that the test script only tests the in-place build (see distutils building above), as it searches the src directory. You can use the following one-step command to trigger an in-place build and test it:
This also runs the ElementTree and cElementTree compatibility tests. To call them separately, make sure you have lxml on your PYTHONPATH first, then run:
If the tests give failures, errors, or worse, segmentation faults, we'd really like to know. Please contact us on the mailing list, and please specify the version of lxml, libxml2, libxslt and Python you were using, as well as your operating system type (Linux, Windows, MacOs, ...).
This is the procedure to make an lxml egg for your platform:
The last 'upload' step only works if you have access to the lxml cheeseshop entry. If not, you can just make an egg with bdist_egg and mail it to the lxml maintainer.
Apple regularly ships new system releases with horribly outdated system libraries. This is specifically the case for libxml2 and libxslt, where the system provided versions are too old to build lxml.
While the Unix environment in Mac-OS X makes it relatively easy to install Unix/Linux style package management tools and new software, it actually seems to be hard to get libraries set up for exclusive usage that Mac-OS X ships in an older version. Alternative distributions (like macports) install their libraries in addition to the system libraries, but the compiler and the runtime loader on Mac-OS still sees the system libraries before the new libraries. This can lead to undebuggable crashes where the newer library seems to be loaded but the older system library is used.
Apple discourages static building against libraries, which would help working around this problem. Apple does not ship static library binaries with its system and several package management systems follow this decision. Therefore, building static binaries would require building the dependencies first. You can do this with the buildout recipe for lxml.
To make sure the newer libxml2 and libxslt versions (e.g. those provided by fink or macports) are used at build time, you must take care that the script xslt-config from the newly installed version is found when running the build setup. The system libraries also provide this script, so the new one must come first in the PATH. The best way to make sure the right version is used is by passing the path to the script as an option to setup.py:
python setup.py build --with-xslt-config=/path/to/xslt-config \ --with-xml2-config=/path/to/xml2-config
Instead of build, you can use any target, like bdist_egg if you want to use setuptools to build an installable egg.
Since release 2.0.6, lxml automatically passes the option -flat_namespace to the C compiler. This was reported to make sure that the libraries that lxml was built against are also used at runtime. Without this option, users needed to add all directories where the newer libraries are installed (i.e. libxml2, libxslt and libexslt) to the DYLD_LIBRARY_PATH environment variable when using lxml (i.e. at runtime). This should no longer be necessary with the new build setup.
Andreas Pakulat proposed the following approach.
In case dpkg-buildpackage tells you that some dependecies are missing, you can either install them manually or run apt-get build-dep lxml.
That will give you .deb packages in the parent directory which can be installed using dpkg -i.