Package lxml :: Package html
[show private | hide private]
[frames | no frames]

Package lxml.html

Submodules

Classes
_MethodFunc An object that represents a method on an element as a function; the function takes either an element or an HTML string.
CheckboxGroup Represents a group of checkboxes (``<input type=checkbox>``) that have the same name.
CheckboxValues Represents the values of the checked checkboxes in a group of checkboxes with the same name.
FieldsDict  
FormElement Represents a <form> element.
HtmlComment  
HtmlElement  
HtmlElementClassLookup A lookup scheme for HTML Element classes.
HtmlEntity  
HtmlMixin  
HtmlProcessingInstruction  
InputElement Represents an ``<input>`` element.
InputGetter An accessor that represents all the input fields in a form.
InputMixin Mix-in for all input elements (input, select, and textarea)
LabelElement Represents a ``<label>`` element.
MultipleSelectOptions Represents all the selected options in a ``<select multiple>`` element.
RadioGroup This object represents several ``<input type=radio>`` elements that have the same name.
SelectElement ``<select>`` element.
TextareaElement ``<textarea>`` element.

Function Summary
  document_fromstring(html, **kw)
  Element(*args, **kw)
  open_in_browser(doc)
Open the HTML document in a web browser (saving it to a temporary file to open it).
  tostring(doc, pretty, include_meta_content_type)
return HTML string representation of the document given
  _contains_block_level_tag(el)
  _element_name(el)
  fragment_fromstring(html, create_parent, **kw)
Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.
  fragments_fromstring(html, no_leading_text, **kw)
Parses several HTML elements, returning a list of elements.
  fromstring(html, **kw)
Parse the html, returning a single element/document.
  open_http_urllib(method, url, values)
  parse(filename, **kw)
Parse a filename, URL, or file-like object into an HTML document.
  submit_form(form, extra_values, open_http)
Helper function to submit a form.

Function Details

open_in_browser(doc)

Open the HTML document in a web browser (saving it to a temporary file to open it).

tostring(doc, pretty=False, include_meta_content_type=False)

return HTML string representation of the document given

note: this will create a meta http-equiv="Content" tag in the head and may replace any that are present

fragment_fromstring(html, create_parent=False, **kw)

Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.

If create_parent is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element.

fragments_fromstring(html, no_leading_text=False, **kw)

Parses several HTML elements, returning a list of elements.

The first item in the list may be a string (though leading whitespace is removed). If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements.

fromstring(html, **kw)

Parse the html, returning a single element/document.

This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document.

parse(filename, **kw)

Parse a filename, URL, or file-like object into an HTML document.

You may pass the keyword argument ``base_url='http://...'`` to set the base URL.

submit_form(form, extra_values=None, open_http=None)

Helper function to submit a form. Returns a file-like object, as from ``urllib.urlopen()``. This object also has a ``.geturl()`` function, which shows the URL if there were any redirects.

You can use this like:
   >>> form = doc.forms[0]
   >>> form.inputs['foo'].value = 'bar' # etc
   >>> response = form.submit()
   >>> doc = parse(response)
   >>> doc.make_links_absolute(response.geturl())
To change the HTTP requester, pass a function as ``open_http`` keyword argument that opens the URL for you. The function must have the following signature:
   open_http(method, URL, values)
The action is one of 'GET' or 'POST', the URL is the target URL as a string, and the values are a sequence of ``(name, value)`` tuples with the form data.

Generated by Epydoc 2.1 on Sat Aug 18 12:44:28 2007 http://epydoc.sf.net