Package lxml :: Package html
[frames] | no frames]

Package html

source code

The lxml.html tool set for HTML handling.
Submodules

Functions
 
document_fromstring(html, **kw) source code
 
fragments_fromstring(html, no_leading_text=False, **kw)
Parses several HTML elements, returning a list of elements.
source code
 
fragment_fromstring(html, create_parent=False, **kw)
Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.
source code
 
fromstring(html, **kw)
Parse the html, returning a single element/document.
source code
 
submit_form(form, extra_values=None, open_http=None)
Helper function to submit a form.
source code
 
tostring(doc, pretty_print=False, include_meta_content_type=False, encoding=None)
return HTML string representation of the document given
source code
 
open_in_browser(doc)
Open the HTML document in a web browser (saving it to a temporary file to open it).
source code
 
Element(*args, **kw) source code
Variables
  find_rel_links = _MethodFunc('find_rel_links', copy= False)
  find_class = _MethodFunc('find_class', copy= False)
  make_links_absolute = _MethodFunc('make_links_absolute', copy=...
  resolve_base_href = _MethodFunc('resolve_base_href', copy= True)
  iterlinks = _MethodFunc('iterlinks', copy= False)
  rewrite_links = _MethodFunc('rewrite_links', copy= True)
Function Details

fragments_fromstring(html, no_leading_text=False, **kw)

source code 

Parses several HTML elements, returning a list of elements.

The first item in the list may be a string (though leading whitespace is removed). If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements.

fragment_fromstring(html, create_parent=False, **kw)

source code 

Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.

If create_parent is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element.

fromstring(html, **kw)

source code 

Parse the html, returning a single element/document.

This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document.

submit_form(form, extra_values=None, open_http=None)

source code 

Helper function to submit a form. Returns a file-like object, as from urllib.urlopen(). This object also has a .geturl() function, which shows the URL if there were any redirects.

You can use this like:

>>> form = doc.forms[0]
>>> form.inputs['foo'].value = 'bar' # etc
>>> response = form.submit()
>>> doc = parse(response)
>>> doc.make_links_absolute(response.geturl())

To change the HTTP requester, pass a function as open_http keyword argument that opens the URL for you. The function must have the following signature:

open_http(method, URL, values)

The action is one of 'GET' or 'POST', the URL is the target URL as a string, and the values are a sequence of (name, value) tuples with the form data.

tostring(doc, pretty_print=False, include_meta_content_type=False, encoding=None)

source code 

return HTML string representation of the document given

note: if include_meta_content_type is true this will create a meta http-equiv="Content" tag in the head; regardless of the value of include_meta_content_type any existing meta http-equiv="Content" tag will be removed


Variables Details

make_links_absolute

Value:
_MethodFunc('make_links_absolute', copy= True)