lxml.cssselect module

CSS Selectors based on XPath.

This module supports selecting XML/HTML tags based on CSS selectors. See the CSSSelector class for details.

This is a thin wrapper around cssselect 0.7 or later.

class lxml.cssselect.CSSSelector(css, namespaces=None, translator='xml')[source]

Bases: XPath

A CSS selector.

Usage:

>>> from lxml import etree, cssselect
>>> select = cssselect.CSSSelector("a tag > child")

>>> root = etree.XML("<a><b><c/><tag><child>TEXT</child></tag></b></a>")
>>> [ el.tag for el in select(root) ]
['child']

To use CSS namespaces, you need to pass a prefix-to-namespace mapping as namespaces keyword argument:

>>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
>>> select_ns = cssselect.CSSSelector('root > rdf|Description',
...                                   namespaces={'rdf': rdfns})

>>> rdf = etree.XML((
...     '<root xmlns:rdf="%s">'
...       '<rdf:Description>blah</rdf:Description>'
...     '</root>') % rdfns)
>>> [(el.tag, el.text) for el in select_ns(rdf)]
[('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
error_log
path

The literal XPath expression.

class lxml.cssselect.LxmlHTMLTranslator(xhtml: bool = False)[source]

Bases: LxmlTranslator, HTMLTranslator

lxml extensions + HTML support.

xpathexpr_cls

alias of XPathExpr

css_to_xpath(css: str, prefix: str = 'descendant-or-self::') str

Translate a group of selectors to XPath.

Pseudo-elements are not supported here since XPath only knows about “real” elements.

Parameters:
  • css – A group of selectors as a string.

  • prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.

Raises:

SelectorSyntaxError on invalid selectors, ExpressionError on unknown/unsupported selectors, including pseudo-elements.

Returns:

The equivalent XPath 1.0 expression as a string.

pseudo_never_matches(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

selector_to_xpath(selector: Selector, prefix: str = 'descendant-or-self::', translate_pseudo_elements: bool = False) str

Translate a parsed selector to XPath.

Parameters:
  • selector – A parsed Selector object.

  • prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.

  • translate_pseudo_elements – Unless this is set to True (as css_to_xpath() does), the pseudo_element attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.

Raises:

ExpressionError on unknown/unsupported selectors.

Returns:

The equivalent XPath 1.0 expression as a string.

xpath(parsed_selector: Element | Hash | Class | Function | Pseudo | Attrib | Negation | Relation | Matching | SpecificityAdjustment | CombinedSelector) XPathExpr

Translate any parsed selector object.

xpath_active_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_attrib(selector: Attrib) XPathExpr

Translate an attribute selector.

xpath_attrib_dashmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_different(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_equals(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_exists(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_includes(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_prefixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_substringmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_suffixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_checked_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is an immediate child of left

xpath_class(class_selector: Class) XPathExpr

Translate a class selector.

xpath_combinedselector(combined: CombinedSelector) XPathExpr

Translate a combined selector.

xpath_contains_function(xpath, function)
xpath_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a child, grand-child or further descendant of left

xpath_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling immediately after left

xpath_disabled_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_element(selector: Element) XPathExpr

Translate a type or universal selector.

xpath_empty_pseudo(xpath: XPathExpr) XPathExpr
xpath_enabled_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_first_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_first_of_type_pseudo(xpath: XPathExpr) XPathExpr
xpath_focus_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_function(function: Function) XPathExpr

Translate a functional pseudo-class.

xpath_hash(id_selector: Hash) XPathExpr

Translate an ID selector.

xpath_hover_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling after left, immediately or not

xpath_lang_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_last_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_last_of_type_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

static xpath_literal(s: str) str
xpath_matching(matching: Matching) XPathExpr
xpath_negation(negation: Negation) XPathExpr
xpath_nth_child_function(xpath: XPathExpr, function: Function, last: bool = False, add_name_test: bool = True) XPathExpr
xpath_nth_last_child_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_nth_last_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_nth_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_only_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_only_of_type_pseudo(xpath: XPathExpr) XPathExpr
xpath_pseudo(pseudo: Pseudo) XPathExpr

Translate a pseudo-class.

xpath_pseudo_element(xpath: XPathExpr, pseudo_element: FunctionalPseudoElement | str) XPathExpr

Translate a pseudo-element.

Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.

xpath_relation(relation: Relation) XPathExpr
xpath_relation_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is an immediate child of left; select left

xpath_relation_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a child, grand-child or further descendant of left; select left

xpath_relation_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling immediately after left; select left

xpath_relation_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling after left, immediately or not; select left

xpath_root_pseudo(xpath: XPathExpr) XPathExpr
xpath_scope_pseudo(xpath: XPathExpr) XPathExpr
xpath_specificityadjustment(matching: SpecificityAdjustment) XPathExpr
xpath_target_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_visited_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
id_attribute = 'id'

The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors

lang_attribute = 'lang'

The attribute used for :lang() depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo

lower_case_attribute_names = False
lower_case_attribute_values = False
lower_case_element_names = False

The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens

When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.

In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.

class lxml.cssselect.LxmlTranslator[source]

Bases: GenericTranslator

A custom CSS selector to XPath translator with lxml-specific extensions.

xpathexpr_cls

alias of XPathExpr

css_to_xpath(css: str, prefix: str = 'descendant-or-self::') str

Translate a group of selectors to XPath.

Pseudo-elements are not supported here since XPath only knows about “real” elements.

Parameters:
  • css – A group of selectors as a string.

  • prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.

Raises:

SelectorSyntaxError on invalid selectors, ExpressionError on unknown/unsupported selectors, including pseudo-elements.

Returns:

The equivalent XPath 1.0 expression as a string.

pseudo_never_matches(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

selector_to_xpath(selector: Selector, prefix: str = 'descendant-or-self::', translate_pseudo_elements: bool = False) str

Translate a parsed selector to XPath.

Parameters:
  • selector – A parsed Selector object.

  • prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.

  • translate_pseudo_elements – Unless this is set to True (as css_to_xpath() does), the pseudo_element attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.

Raises:

ExpressionError on unknown/unsupported selectors.

Returns:

The equivalent XPath 1.0 expression as a string.

xpath(parsed_selector: Element | Hash | Class | Function | Pseudo | Attrib | Negation | Relation | Matching | SpecificityAdjustment | CombinedSelector) XPathExpr

Translate any parsed selector object.

xpath_active_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_attrib(selector: Attrib) XPathExpr

Translate an attribute selector.

xpath_attrib_dashmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_different(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_equals(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_exists(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_includes(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_prefixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_substringmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_attrib_suffixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
xpath_checked_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is an immediate child of left

xpath_class(class_selector: Class) XPathExpr

Translate a class selector.

xpath_combinedselector(combined: CombinedSelector) XPathExpr

Translate a combined selector.

xpath_contains_function(xpath, function)[source]
xpath_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a child, grand-child or further descendant of left

xpath_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling immediately after left

xpath_disabled_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_element(selector: Element) XPathExpr

Translate a type or universal selector.

xpath_empty_pseudo(xpath: XPathExpr) XPathExpr
xpath_enabled_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_first_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_first_of_type_pseudo(xpath: XPathExpr) XPathExpr
xpath_focus_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_function(function: Function) XPathExpr

Translate a functional pseudo-class.

xpath_hash(id_selector: Hash) XPathExpr

Translate an ID selector.

xpath_hover_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling after left, immediately or not

xpath_lang_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_last_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_last_of_type_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

static xpath_literal(s: str) str
xpath_matching(matching: Matching) XPathExpr
xpath_negation(negation: Negation) XPathExpr
xpath_nth_child_function(xpath: XPathExpr, function: Function, last: bool = False, add_name_test: bool = True) XPathExpr
xpath_nth_last_child_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_nth_last_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_nth_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
xpath_only_child_pseudo(xpath: XPathExpr) XPathExpr
xpath_only_of_type_pseudo(xpath: XPathExpr) XPathExpr
xpath_pseudo(pseudo: Pseudo) XPathExpr

Translate a pseudo-class.

xpath_pseudo_element(xpath: XPathExpr, pseudo_element: FunctionalPseudoElement | str) XPathExpr

Translate a pseudo-element.

Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.

xpath_relation(relation: Relation) XPathExpr
xpath_relation_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is an immediate child of left; select left

xpath_relation_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a child, grand-child or further descendant of left; select left

xpath_relation_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling immediately after left; select left

xpath_relation_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr

right is a sibling after left, immediately or not; select left

xpath_root_pseudo(xpath: XPathExpr) XPathExpr
xpath_scope_pseudo(xpath: XPathExpr) XPathExpr
xpath_specificityadjustment(matching: SpecificityAdjustment) XPathExpr
xpath_target_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

xpath_visited_pseudo(xpath: XPathExpr) XPathExpr

Common implementation for pseudo-classes that never match.

attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
id_attribute = 'id'

The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors

lang_attribute = 'xml:lang'

The attribute used for :lang() depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo

lower_case_attribute_names = False
lower_case_attribute_values = False
lower_case_element_names = False

The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens

When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.

In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.

lxml.cssselect._make_lower_case(context, s)[source]