Package lxml :: Package html
[hide private]
[frames] | no frames]

Package html

source code

Submodules [hide private]

Classes [hide private]
  HtmlMixin
  _MethodFunc
An object that represents a method on an element as a function; the function takes either an element or an HTML string.
  HtmlComment
  HtmlElement
  HtmlProcessingInstruction
  HtmlEntity
  HtmlElementClassLookup
A lookup scheme for HTML Element classes.
  FormElement
Represents a <form> element.
  FieldsDict
  InputGetter
An accessor that represents all the input fields in a form.
  InputMixin
Mix-in for all input elements (input, select, and textarea)
  TextareaElement
``<textarea>`` element.
  SelectElement
``<select>`` element.
  MultipleSelectOptions
Represents all the selected options in a ``<select multiple>`` element.
  RadioGroup
This object represents several ``<input type=radio>`` elements that have the same name.
  CheckboxGroup
Represents a group of checkboxes (``<input type=checkbox>``) that have the same name.
  CheckboxValues
Represents the values of the checked checkboxes in a group of checkboxes with the same name.
  InputElement
Represents an ``<input>`` element.
  LabelElement
Represents a ``<label>`` element.
  HTMLParser
Functions [hide private]
 
document_fromstring(html, **kw) source code
 
fragments_fromstring(html, no_leading_text=False, **kw)
Parses several HTML elements, returning a list of elements.
source code
 
fragment_fromstring(html, create_parent=False, **kw)
Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.
source code
 
fromstring(html, **kw)
Parse the html, returning a single element/document.
source code
 
parse(filename, parser=None, **kw)
Parse a filename, URL, or file-like object into an HTML document.
source code
 
_contains_block_level_tag(el) source code
 
_element_name(el) source code
 
submit_form(form, extra_values=None, open_http=None)
Helper function to submit a form.
source code
 
open_http_urllib(method, url, values) source code
 
__replace_meta_content_type(...)
sub(repl, string[, count = 0]) --> newstring Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.
source code
 
tostring(doc, pretty_print=False, include_meta_content_type=False, encoding=None)
return HTML string representation of the document given
source code
 
open_in_browser(doc)
Open the HTML document in a web browser (saving it to a temporary file to open it).
source code
 
Element(*args, **kw) source code
Variables [hide private]
  _rel_links_xpath = descendant-or-self::a[@rel]
  _class_xpath = descendant-or-self::*[@class and contains(conca...
  _id_xpath = descendant-or-self::*[@id=$id]
  _collect_string_content = string()
  _css_url_re = re.compile(r'(?i)url\((.*?)\)')
  _css_import_re = re.compile(r'@import "(.*?)"')
  _label_xpath = //label[@for=$id]
  _archive_re = re.compile(r'[^ ]+')
  find_rel_links = _MethodFunc('find_rel_links', copy= False)
  find_class = _MethodFunc('find_class', copy= False)
  make_links_absolute = _MethodFunc('make_links_absolute', copy=...
  resolve_base_href = _MethodFunc('resolve_base_href', copy= True)
  iterlinks = _MethodFunc('iterlinks', copy= False)
  rewrite_links = _MethodFunc('rewrite_links', copy= True)
  html_parser = HTMLParser()
Function Details [hide private]

fragments_fromstring(html, no_leading_text=False, **kw)

source code 

Parses several HTML elements, returning a list of elements.

The first item in the list may be a string (though leading whitespace is removed). If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements.

fragment_fromstring(html, create_parent=False, **kw)

source code 

Parses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element.

If create_parent is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element.

fromstring(html, **kw)

source code 

Parse the html, returning a single element/document.

This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document.

parse(filename, parser=None, **kw)

source code 

Parse a filename, URL, or file-like object into an HTML document.

You may pass the keyword argument ``base_url='http://...'`` to set the base URL.

submit_form(form, extra_values=None, open_http=None)

source code 

Helper function to submit a form. Returns a file-like object, as from ``urllib.urlopen()``. This object also has a ``.geturl()`` function, which shows the URL if there were any redirects.

You can use this like:

   >>> form = doc.forms[0]
   >>> form.inputs['foo'].value = 'bar' # etc
   >>> response = form.submit()
   >>> doc = parse(response)
   >>> doc.make_links_absolute(response.geturl())

To change the HTTP requester, pass a function as ``open_http`` keyword argument that opens the URL for you. The function must have the following signature:

   open_http(method, URL, values)

The action is one of 'GET' or 'POST', the URL is the target URL as a string, and the values are a sequence of ``(name, value)`` tuples with the form data.

tostring(doc, pretty_print=False, include_meta_content_type=False, encoding=None)

source code 

return HTML string representation of the document given

note: if include_meta_content_type is true this will create a meta http-equiv="Content" tag in the head; regardless of the value of include_meta_content_type any existing meta http-equiv="Content" tag will be removed


Variables Details [hide private]

_class_xpath

Value:
descendant-or-self::*[@class and contains(concat(' ', normalize-space(\
@class), ' '), concat(' ', $class_name, ' '))]

make_links_absolute

Value:
_MethodFunc('make_links_absolute', copy= True)