Package lxml :: Module etree :: Class HTMLParser
[show private | hide private]
[frames | no frames]

Type HTMLParser

 object --+    
          |    
_BaseParser --+
              |
             HTMLParser


The HTML parser. This parser allows reading HTML into a normal XML tree. By default, it can read broken (non well-formed) HTML, depending on the capabilities of libxml2. Use the 'recover' option to switch this off.

Available boolean keyword arguments: * recover - try hard to parse through broken HTML (default: True) * no_network - prevent network access (default: True) * remove_blank_text - discard empty text nodes * remove_comments - discard comments * remove_pis - discard processing instructions * compact - safe memory for short text content (default: True)

Note that you should avoid sharing parsers between threads for performance reasons.
Method Summary
  __init__(...)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
  __new__(T, S, ...)
T.__new__(S, ...) -> a new object with type S, a subtype of T
    Inherited from _BaseParser
  copy(...)
Create a new parser with the same configuration.
  makeelement(...)
Creates a new element associated with this parser.
  set_element_class_lookup(...)
Set a lookup scheme for element classes generated from this parser.
  setElementClassLookup(...)
Deprecated, use ``parser.set_element_class_lookup(lookup)`` instead.
    Inherited from object
  __delattr__(...)
x.__delattr__('name') <==> del x.name
  __getattribute__(...)
x.__getattribute__('name') <==> x.name
  __hash__(x)
x.__hash__() <==> hash(x)
  __reduce__(...)
helper for pickle
  __reduce_ex__(...)
helper for pickle
  __repr__(x)
x.__repr__() <==> repr(x)
  __setattr__(...)
x.__setattr__('name', value) <==> x.name = value
  __str__(x)
x.__str__() <==> str(x)

Class Variable Summary
PyCObject __pyx_vtable__ = <PyCObject object at 0x401cb9c8>
    Inherited from _BaseParser
getset_descriptor error_log = <attribute 'error_log' of 'lxml.etree._BaseP...
member_descriptor resolvers = <member 'resolvers' of 'lxml.etree._BasePars...

Method Details

__init__(...)
(Constructor)

x.__init__(...) initializes x; see x.__class__.__doc__ for signature
Overrides:
lxml.etree._BaseParser.__init__

__new__(T, S, ...)

T.__new__(S, ...) -> a new object with type S, a subtype of T
Returns:
a new object with type S, a subtype of T
Overrides:
lxml.etree._BaseParser.__new__

Class Variable Details

__pyx_vtable__

Type:
PyCObject
Value:
<PyCObject object at 0x401cb9c8>                                       

Generated by Epydoc 2.1 on Sat Aug 18 12:44:27 2007 http://epydoc.sf.net