Class HTMLParser
object --+
|
_BaseParser --+
|
_FeedParser --+
|
HTMLParser
- Known Subclasses:
-
HTMLParser(self, recover=True, no_network=True, remove_blank_text=False, compact=True, remove_comments=False, remove_pis=False, target=None, encoding=None, schema=None)
The HTML parser.
This parser allows reading HTML into a normal XML tree. By
default, it can read broken (non well-formed) HTML, depending on
the capabilities of libxml2. Use the 'recover' option to switch
this off.
Available boolean keyword arguments:
- recover - try hard to parse through broken HTML (default: True)
- no_network - prevent network access for related files (default: True)
- remove_blank_text - discard empty text nodes
- remove_comments - discard comments
- remove_pis - discard processing instructions
- compact - safe memory for short text content (default: True)
Other keyword arguments:
- encoding - override the document encoding
- target - a parser target object that will receive the parse events
- schema - an XMLSchema to validate against
Note that you should avoid sharing parsers between threads for performance
reasons.
|
__init__(self,
recover=True,
no_network=True,
remove_blank_text=False,
compact=True,
remove_comments=False,
remove_pis=False,
target=None,
encoding=None,
schema=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
|
|
a new object with type S, a subtype of T
|
|
Inherited from _FeedParser :
close ,
feed
Inherited from _BaseParser :
copy ,
makeelement ,
setElementClassLookup ,
set_element_class_lookup
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__str__
|
Inherited from _FeedParser :
feed_error_log
Inherited from _BaseParser :
error_log ,
resolvers ,
version
Inherited from object :
__class__
|
__init__(self,
recover=True,
no_network=True,
remove_blank_text=False,
compact=True,
remove_comments=False,
remove_pis=False,
target=None,
encoding=None,
schema=None)
(Constructor)
|
|
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
- Overrides:
object.__init__
|
- Returns: a new object with type S, a subtype of T
- Overrides:
object.__new__
|