Package lxml :: Package html :: Module html5parser :: Class HTMLParser
[hide private]
[frames] | no frames]

Class HTMLParser

source code

                     object --+    
                              |    
html5lib.html5parser.HTMLParser --+
                                  |
                                 HTMLParser

An html5lib HTML parser with lxml as tree.
Instance Methods [hide private]
 
__init__(self, strict=False, **kwargs)
strict - raise an exception when a parse error is encountered
source code

Inherited from html5lib.html5parser.HTMLParser: adjustForeignAttributes, adjustMathMLAttributes, adjustSVGAttributes, isHTMLIntegrationPoint, isMathMLTextIntegrationPoint, mainLoop, normalizeToken, normalizedTokens, parse, parseError, parseFragment, parseRCDataRawtext, reparseTokenNormal, reset, resetInsertionMode

Inherited from html5lib.html5parser.HTMLParser (private): _parse

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, strict=False, **kwargs)
(Constructor)

source code 

strict - raise an exception when a parse error is encountered

tree - a treebuilder class controlling the type of tree that will be returned. Built in treebuilders can be accessed through html5lib.treebuilders.getTreeBuilder(treeType)

tokenizer - a class that provides a stream of tokens to the treebuilder. This may be replaced for e.g. a sanitizer which converts some tags to text

Overrides: object.__init__
(inherited documentation)