Package lxml :: Module etree :: Class HTMLParser
[hide private]
[frames] | no frames]

Class HTMLParser

 object --+        
          |        
_BaseParser --+    
              |    
    _FeedParser --+
                  |
                 HTMLParser
Known Subclasses:

The HTML parser. This parser allows reading HTML into a normal XML tree. By default, it can read broken (non well-formed) HTML, depending on the capabilities of libxml2. Use the 'recover' option to switch this off.

Available boolean keyword arguments: * recover - try hard to parse through broken HTML (default: True) * no_network - prevent network access for related files (default: True) * remove_blank_text - discard empty text nodes * remove_comments - discard comments * remove_pis - discard processing instructions * compact - safe memory for short text content (default: True)

Other keyword arguments: * encoding - override the document encoding * target - a parser target object that will receive the parse events * schema - an XMLSchema to validate against

Note that you should avoid sharing parsers between threads for performance reasons.

Instance Methods [hide private]
 
__init__(...)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
a new object with type S, a subtype of T
__new__(T, S, ...)

Inherited from _FeedParser: close, feed

Inherited from _BaseParser: copy, makeelement, setElementClassLookup, set_element_class_lookup

Inherited from object: __delattr__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from _FeedParser: feed_error_log

Inherited from _BaseParser: error_log, resolvers, version

Inherited from object: __class__

Method Details [hide private]

__init__(...)
(Constructor)

 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__

__new__(T, S, ...)

 
Returns: a new object with type S, a subtype of T
Overrides: object.__new__