Package lxml :: Module etree :: Class HTMLParser
[frames] | no frames]

Class HTMLParser

 object --+        
          |        
_BaseParser --+    
              |    
    _FeedParser --+
                  |
                 HTMLParser
Known Subclasses:

HTMLParser(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None) The HTML parser.

This parser allows reading HTML into a normal XML tree. By default, it can read broken (non well-formed) HTML, depending on the capabilities of libxml2. Use the 'recover' option to switch this off.

Available boolean keyword arguments:

Other keyword arguments:

Note that you should avoid sharing parsers between threads for performance reasons.

Instance Methods
 
__init__(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
a new object with type S, a subtype of T
__new__(T, S, ...)

Inherited from _FeedParser: close, feed

Inherited from _BaseParser: copy, makeelement, setElementClassLookup, set_element_class_lookup

Inherited from object: __delattr__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties

Inherited from _FeedParser: feed_error_log

Inherited from _BaseParser: error_log, resolvers, version

Inherited from object: __class__

Method Details

__init__(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
(Constructor)

 
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
Overrides: object.__init__

__new__(T, S, ...)

 
Returns: a new object with type S, a subtype of T
Overrides: object.__new__