lxml.html.soupparser module

External interface to the BeautifulSoup HTML parser.

class lxml.html.soupparser._PseudoTag(contents)[source]: Bases: object

lxml.html.soupparser._convert_tree(beautiful_soup_tree, makeelement)[source]

lxml.html.soupparser._init_node_converters(makeelement)[source]

lxml.html.soupparser._parse(source, beautifulsoup, makeelement, **bsargs)[source]

lxml.html.soupparser._parse_doctype_declaration(string, pos=0, endpos=9223372036854775807): Matches zero or more characters at the beginning of the string.

lxml.html.soupparser.convert_tree(beautiful_soup_tree, makeelement=None)[source]

Convert a BeautifulSoup tree to a list of Element trees.

Returns a list instead of a single root Element to support HTML-like soup with more than one root element.

You can pass a different Element factory through the makeelement keyword.

lxml.html.soupparser.fromstring(data, beautifulsoup=None, makeelement=None, **bsargs)[source]

Parse a string of HTML data into an Element tree using the BeautifulSoup parser.

Returns the root <html> Element of the tree.

You can pass a different BeautifulSoup parser through the beautifulsoup keyword, and a diffent Element factory function through the makeelement keyword. By default, the standard BeautifulSoup class and the default factory of lxml.html are used.

lxml.html.soupparser.handle_entities(repl, string, count=0): Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.

lxml.html.soupparser.parse(file, beautifulsoup=None, makeelement=None, **bsargs)[source]

Parse a file into an ElemenTree using the BeautifulSoup parser.

You can pass a different BeautifulSoup parser through the beautifulsoup keyword, and a diffent Element factory function through the makeelement keyword. By default, the standard BeautifulSoup class and the default factory of lxml.html are used.

lxml.html.soupparser.unescape(string)[source]