lxml.cssselect module
CSS Selectors based on XPath.
This module supports selecting XML/HTML tags based on CSS selectors. See the CSSSelector class for details.
This is a thin wrapper around cssselect 0.7 or later.
- class lxml.cssselect.CSSSelector(css, namespaces=None, translator='xml')[source]
Bases:
XPath
A CSS selector.
Usage:
>>> from lxml import etree, cssselect >>> select = cssselect.CSSSelector("a tag > child") >>> root = etree.XML("<a><b><c/><tag><child>TEXT</child></tag></b></a>") >>> [ el.tag for el in select(root) ] ['child']
To use CSS namespaces, you need to pass a prefix-to-namespace mapping as
namespaces
keyword argument:>>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' >>> select_ns = cssselect.CSSSelector('root > rdf|Description', ... namespaces={'rdf': rdfns}) >>> rdf = etree.XML(( ... '<root xmlns:rdf="%s">' ... '<rdf:Description>blah</rdf:Description>' ... '</root>') % rdfns) >>> [(el.tag, el.text) for el in select_ns(rdf)] [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
- error_log
- path
The literal XPath expression.
- class lxml.cssselect.LxmlHTMLTranslator(xhtml: bool = False)[source]
Bases:
LxmlTranslator
,HTMLTranslator
lxml extensions + HTML support.
- xpathexpr_cls
alias of
XPathExpr
- css_to_xpath(css: str, prefix: str = 'descendant-or-self::') str
Translate a group of selectors to XPath.
Pseudo-elements are not supported here since XPath only knows about “real” elements.
- Parameters:
css – A group of selectors as a string.
prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.
- Raises:
SelectorSyntaxError
on invalid selectors,ExpressionError
on unknown/unsupported selectors, including pseudo-elements.- Returns:
The equivalent XPath 1.0 expression as a string.
- pseudo_never_matches(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- selector_to_xpath(selector: Selector, prefix: str = 'descendant-or-self::', translate_pseudo_elements: bool = False) str
Translate a parsed selector to XPath.
- Parameters:
selector – A parsed
Selector
object.prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.
translate_pseudo_elements – Unless this is set to
True
(ascss_to_xpath()
does), thepseudo_element
attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.
- Raises:
ExpressionError
on unknown/unsupported selectors.- Returns:
The equivalent XPath 1.0 expression as a string.
- xpath(parsed_selector: Element | Hash | Class | Function | Pseudo | Attrib | Negation | Relation | Matching | SpecificityAdjustment | CombinedSelector) XPathExpr
Translate any parsed selector object.
- xpath_active_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_attrib(selector: Attrib) XPathExpr
Translate an attribute selector.
- xpath_attrib_dashmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_different(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_equals(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_exists(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_includes(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_prefixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_substringmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_suffixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_checked_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is an immediate child of left
- xpath_class(class_selector: Class) XPathExpr
Translate a class selector.
- xpath_combinedselector(combined: CombinedSelector) XPathExpr
Translate a combined selector.
- xpath_contains_function(xpath, function)
- xpath_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a child, grand-child or further descendant of left
- xpath_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling immediately after left
- xpath_disabled_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_element(selector: Element) XPathExpr
Translate a type or universal selector.
- xpath_empty_pseudo(xpath: XPathExpr) XPathExpr
- xpath_enabled_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_first_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_first_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_focus_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_function(function: Function) XPathExpr
Translate a functional pseudo-class.
- xpath_hash(id_selector: Hash) XPathExpr
Translate an ID selector.
- xpath_hover_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling after left, immediately or not
- xpath_lang_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_last_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_last_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_link_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- static xpath_literal(s: str) str
- xpath_matching(matching: Matching) XPathExpr
- xpath_negation(negation: Negation) XPathExpr
- xpath_nth_child_function(xpath: XPathExpr, function: Function, last: bool = False, add_name_test: bool = True) XPathExpr
- xpath_nth_last_child_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_nth_last_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_nth_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_only_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_only_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_pseudo(pseudo: Pseudo) XPathExpr
Translate a pseudo-class.
- xpath_pseudo_element(xpath: XPathExpr, pseudo_element: FunctionalPseudoElement | str) XPathExpr
Translate a pseudo-element.
Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.
- xpath_relation(relation: Relation) XPathExpr
- xpath_relation_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is an immediate child of left; select left
- xpath_relation_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a child, grand-child or further descendant of left; select left
- xpath_relation_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling immediately after left; select left
- xpath_relation_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling after left, immediately or not; select left
- xpath_root_pseudo(xpath: XPathExpr) XPathExpr
- xpath_scope_pseudo(xpath: XPathExpr) XPathExpr
- xpath_specificityadjustment(matching: SpecificityAdjustment) XPathExpr
- xpath_target_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_visited_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
- combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
- id_attribute = 'id'
The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors
- lang_attribute = 'lang'
The attribute used for
:lang()
depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo
- lower_case_attribute_names = False
- lower_case_attribute_values = False
- lower_case_element_names = False
The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens
When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.
In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.
- class lxml.cssselect.LxmlTranslator[source]
Bases:
GenericTranslator
A custom CSS selector to XPath translator with lxml-specific extensions.
- xpathexpr_cls
alias of
XPathExpr
- css_to_xpath(css: str, prefix: str = 'descendant-or-self::') str
Translate a group of selectors to XPath.
Pseudo-elements are not supported here since XPath only knows about “real” elements.
- Parameters:
css – A group of selectors as a string.
prefix – This string is prepended to the XPath expression for each selector. The default makes selectors scoped to the context node’s subtree.
- Raises:
SelectorSyntaxError
on invalid selectors,ExpressionError
on unknown/unsupported selectors, including pseudo-elements.- Returns:
The equivalent XPath 1.0 expression as a string.
- pseudo_never_matches(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- selector_to_xpath(selector: Selector, prefix: str = 'descendant-or-self::', translate_pseudo_elements: bool = False) str
Translate a parsed selector to XPath.
- Parameters:
selector – A parsed
Selector
object.prefix – This string is prepended to the resulting XPath expression. The default makes selectors scoped to the context node’s subtree.
translate_pseudo_elements – Unless this is set to
True
(ascss_to_xpath()
does), thepseudo_element
attribute of the selector is ignored. It is the caller’s responsibility to reject selectors with pseudo-elements, or to account for them somehow.
- Raises:
ExpressionError
on unknown/unsupported selectors.- Returns:
The equivalent XPath 1.0 expression as a string.
- xpath(parsed_selector: Element | Hash | Class | Function | Pseudo | Attrib | Negation | Relation | Matching | SpecificityAdjustment | CombinedSelector) XPathExpr
Translate any parsed selector object.
- xpath_active_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_attrib(selector: Attrib) XPathExpr
Translate an attribute selector.
- xpath_attrib_dashmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_different(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_equals(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_exists(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_includes(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_prefixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_substringmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_attrib_suffixmatch(xpath: XPathExpr, name: str, value: str | None) XPathExpr
- xpath_checked_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is an immediate child of left
- xpath_class(class_selector: Class) XPathExpr
Translate a class selector.
- xpath_combinedselector(combined: CombinedSelector) XPathExpr
Translate a combined selector.
- xpath_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a child, grand-child or further descendant of left
- xpath_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling immediately after left
- xpath_disabled_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_element(selector: Element) XPathExpr
Translate a type or universal selector.
- xpath_empty_pseudo(xpath: XPathExpr) XPathExpr
- xpath_enabled_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_first_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_first_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_focus_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_function(function: Function) XPathExpr
Translate a functional pseudo-class.
- xpath_hash(id_selector: Hash) XPathExpr
Translate an ID selector.
- xpath_hover_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling after left, immediately or not
- xpath_lang_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_last_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_last_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_link_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- static xpath_literal(s: str) str
- xpath_matching(matching: Matching) XPathExpr
- xpath_negation(negation: Negation) XPathExpr
- xpath_nth_child_function(xpath: XPathExpr, function: Function, last: bool = False, add_name_test: bool = True) XPathExpr
- xpath_nth_last_child_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_nth_last_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_nth_of_type_function(xpath: XPathExpr, function: Function) XPathExpr
- xpath_only_child_pseudo(xpath: XPathExpr) XPathExpr
- xpath_only_of_type_pseudo(xpath: XPathExpr) XPathExpr
- xpath_pseudo(pseudo: Pseudo) XPathExpr
Translate a pseudo-class.
- xpath_pseudo_element(xpath: XPathExpr, pseudo_element: FunctionalPseudoElement | str) XPathExpr
Translate a pseudo-element.
Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes.
- xpath_relation(relation: Relation) XPathExpr
- xpath_relation_child_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is an immediate child of left; select left
- xpath_relation_descendant_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a child, grand-child or further descendant of left; select left
- xpath_relation_direct_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling immediately after left; select left
- xpath_relation_indirect_adjacent_combinator(left: XPathExpr, right: XPathExpr) XPathExpr
right is a sibling after left, immediately or not; select left
- xpath_root_pseudo(xpath: XPathExpr) XPathExpr
- xpath_scope_pseudo(xpath: XPathExpr) XPathExpr
- xpath_specificityadjustment(matching: SpecificityAdjustment) XPathExpr
- xpath_target_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- xpath_visited_pseudo(xpath: XPathExpr) XPathExpr
Common implementation for pseudo-classes that never match.
- attribute_operator_mapping = {'!=': 'different', '$=': 'suffixmatch', '*=': 'substringmatch', '=': 'equals', '^=': 'prefixmatch', 'exists': 'exists', '|=': 'dashmatch', '~=': 'includes'}
- combinator_mapping = {' ': 'descendant', '+': 'direct_adjacent', '>': 'child', '~': 'indirect_adjacent'}
- id_attribute = 'id'
The attribute used for ID selectors depends on the document language: http://www.w3.org/TR/selectors/#id-selectors
- lang_attribute = 'xml:lang'
The attribute used for
:lang()
depends on the document language: http://www.w3.org/TR/selectors/#lang-pseudo
- lower_case_attribute_names = False
- lower_case_attribute_values = False
- lower_case_element_names = False
The case sensitivity of document language element names, attribute names, and attribute values in selectors depends on the document language. http://www.w3.org/TR/selectors/#casesens
When a document language defines one of these as case-insensitive, cssselect assumes that the document parser makes the parsed values lower-case. Making the selector lower-case too makes the comparaison case-insensitive.
In HTML, element names and attributes names (but not attribute values) are case-insensitive. All of lxml.html, html5lib, BeautifulSoup4 and HTMLParser make them lower-case in their parse result, so the assumption holds.