LP#1742885: lxml no longer expands external entities (XXE) by default to prevent the security risk of loading arbitrary files and URLs. If this feature is needed, it can be enabled in a backwards compatible way by using a parser with the option resolve_entities=True. The new default is resolve_entities='internal'.
With libxml2 2.10.4 and later (as provided by the lxml 5.0 binary wheels), parsing HTML tags with "prefixes" no longer builds a namespace dictionary in nsmap but considers the prefix:name string the actual tag name. With older libxml2 versions, since 2.9.11, the prefix was removed. Before that, the prefix was parsed as XML prefix.
lxml 5.0 does not try to hide this difference but now changes the ElementPath implementation to let element.find("part1:part2") search for the tag part1:part2 in documents parsed as HTML, instead of looking only for part2.
LP#2024343: The validation of the schema file itself is now optional in the ISO-Schematron implementation. This was done because some lxml distributions discard the RNG validation schema file due to licensing issues. The validation can now always be disabled with Schematron(..., validate_schema=False). It is enabled by default if available and disabled otherwise. The module constant lxml.isoschematron.schematron_schema_valid_supported can be used to detect whether schema file validation is available.
Some redundant and long deprecated methods were removed: parser.setElementClassLookup(), xslt_transform.apply(), xpath.evaluate().
Some incorrect declarations were removed from python.pxd. In general, this file should not be used by external Cython code. Use the C-API declarations provided by Cython itself instead.
Binary wheels use the library versions libxml2 2.12.3 and libxslt 1.1.39.
Built with Cython 3.0.7, updated to follow recent changes in Cython 3.1-dev.
GH#251: HTML comments were handled incorrectly by the soupparser. Patch by mozbugbox.
LP#1654544: The html5parser no longer passes the useChardet option if the input is a Unicode string, unless explicitly requested. When parsing files, the default is to enable it when a URL or file path is passed (because the file is then opened in binary mode), and to disable it when reading from a file(-like) object.
Note: This is a backwards incompatible change of the default configuration. If your code parses byte strings/streams and depends on character detection, please pass the option guess_charset=True explicitly, which already worked in older lxml versions.
LP#1703810: etree.fromstring() failed to parse UTF-32 data with BOM.
LP#1526522: Some RelaxNG errors were not reported in the error log.
LP#1567526: Empty and plain text input raised a TypeError in soupparser.
LP#1710429: Uninitialised variable usage in HTML diff.
LP#1415643: The closing tags context manager in xmlfile() could continue to output end tags even after writing failed with an exception.
LP#1465357: xmlfile.write() now accepts and ignores None as input argument.
Compilation under Py3.7-pre failed due to a modified function signature.
Most long-time deprecated functions and methods were removed:
etree.clearErrorLog(), use etree.clear_error_log()
etree.useGlobalPythonLog(), use etree.use_global_python_log()
etree.ElementClassLookup.setFallback(), use etree.ElementClassLookup.set_fallback()
etree.getDefaultParser(), use etree.get_default_parser()
etree.setDefaultParser(), use etree.set_default_parser()
etree.setElementClassLookup(), use etree.set_element_class_lookup()
Note that parser.setElementClassLookup() has not been removed yet, although parser.set_element_class_lookup() should be used instead.
xpath_evaluator.registerNamespace(), use xpath_evaluator.register_namespace()
xpath_evaluator.registerNamespaces(), use xpath_evaluator.register_namespaces()
objectify.setPytypeAttributeTag, use objectify.set_pytype_attribute_tag
objectify.setDefaultParser(), use objectify.set_default_parser()
Initial public release.