lxml.html.HtmlMixin

Package lxml :: Package html :: Class HtmlMixin

[show private | hide private]

Type HtmlMixin

object --+
         |
        HtmlMixin

Known Subclasses:: HtmlComment, HtmlElement, HtmlEntity, HtmlProcessingInstruction

Method Summary
	`cssselect(self, expr)` Run the CSS expression on this element and its children, returning a list of the results.
	`drop_tag(self)` Remove the tag, but not its children or text.
	`drop_tree(self)` Removes this element from the tree, including its children and text.
	`find_class(self, class_name)` Find any elements with the given class name.
	`find_rel_links(self, rel)` Find any links like ``<a rel="{rel}">...</a>``; returns a list of elements.
	`get_element_by_id(self, id, *default)` Get the first element in a document with the given id.
	`iterlinks(self)` Yield (element, attribute, link, pos), where attribute may be None (indicating the link is in the text).
	`label__del(self)`
	`label__get(self)` Get or set any <label> element associated with this element.
	`label__set(self, label)`
	`make_links_absolute(self, base_url, resolve_base_href)` Make all links in the document absolute, given the ``base_url`` for the document (the full URL where the document came from), or if no ``base_url`` is given, then the ``.base_url`` of the document.
	`resolve_base_href(self)` Find any ``<base href>`` tag in the document, and apply its values to all links found in the document.
	`rewrite_links(self, link_repl_func, resolve_base_href, base_href)` Rewrite all the links in the document.
	`text_content(self)` Return the text content of the tag (and the text in any children).
Inherited from object
	`__init__(...)` x.__init__(...) initializes x; see x.__class__.__doc__ for signature
	`__delattr__(...)` x.__delattr__('name') <==> del x.name
	`__getattribute__(...)` x.__getattribute__('name') <==> x.name
	`__hash__(x)` x.__hash__() <==> hash(x)
	`__new__(T, S, ...)` T.__new__(S, ...) -> a new object with type S, a subtype of T
	`__reduce__(...)` helper for pickle
	`__reduce_ex__(...)` helper for pickle
	`__repr__(x)` x.__repr__() <==> repr(x)
	`__setattr__(...)` x.__setattr__('name', value) <==> x.name = value
	`__str__(x)` x.__str__() <==> str(x)

Property Summary
	`base_url`: Returns the base URL, given when the page was parsed.
	`body`: Return the <body> element.
	`forms`: Return a list of all the forms
	`head`: Returns the <head> element.
	`label`: Get or set any <label> element associated with this element.

Method Details

cssselect(self, expr)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) -- note that pre-compiling the expression can provide a substantial speedup.

drop_tag(self)

Remove the tag, but not its children or text. The children and text are merged into the parent.

Example:

   >>> h = fragment_fromstring('<div>Hello <b>World!</b></div>')
   >>> h.find('//b').drop_tag()
   >>> print tostring(h)
   <div>Hello World!</div>

drop_tree(self)

Removes this element from the tree, including its children and text. The tail text is joined to the previous element or parent.

find_class(self, class_name)

Find any elements with the given class name.

find_rel_links(self, rel)

Find any links like ``<a rel="{rel}">...</a>``; returns a list of elements.

get_element_by_id(self, id, *default)

Get the first element in a document with the given id. If none is found, return the default argument if provided or raise KeyError otherwise.

Note that there can be more than one element with the same id, and this isn't uncommon in HTML documents found in the wild. Browsers return only the first match, and this function does the same.

iterlinks(self)

Yield (element, attribute, link, pos), where attribute may be None (indicating the link is in the text). ``pos`` is the position where the link occurs; often 0, but sometimes something else in the case of links in stylesheets or style tags.

Note: <base href> is *not* taken into account in any way. The link you get is exactly the link in the document.

label__get(self)

Get or set any <label> element associated with this element.

make_links_absolute(self, base_url=None, resolve_base_href=True)

Make all links in the document absolute, given the ``base_url`` for the document (the full URL where the document came from), or if no ``base_url`` is given, then the ``.base_url`` of the document.

If ``resolve_base_href`` is true, then any ``<base href>`` tags in the document are used *and* removed from the document. If it is false then any such tag is ignored.

resolve_base_href(self)

Find any ``<base href>`` tag in the document, and apply its values to all links found in the document. Also remove the tag once it has been applied.

rewrite_links(self, link_repl_func, resolve_base_href=True, base_href=None)

Rewrite all the links in the document. For each link ``link_repl_func(link)`` will be called, and the return value will replace the old link.

Note that links may not be absolute (unless you first called ``make_links_absolute()``), and may be internal (e.g., ``'#anchor'``). They can also be values like ``'mailto:email'`` or ``'javascript:expr'``.

If you give ``base_href`` then all links passed to ``link_repl_func()`` will take that into account.

If the ``link_repl_func`` returns None, the attribute or tag text will be removed completely.

text_content(self)

Return the text content of the tag (and the text in any children).

Property Details

base_url

Returns the base URL, given when the page was parsed.

Use with ``urlparse.urljoin(el.base_url, href)`` to get absolute URLs.

Get Method:: base_url(...)

body

Return the <body> element. Can be called from a child element to get the document's head.

Get Method:: body(...)

forms

Return a list of all the forms

Get Method:: forms(...)

head

Returns the <head> element. Can be called from a child element to get the document's head.

Get Method:: head(...)

label

Get or set any <label> element associated with this element.

Get Method:: label__get(self)
Set Method:: label__set(self, label)
Delete Method:: label__del(self)

Generated by Epydoc 2.1 on Sat Aug 18 12:44:28 2007

http://epydoc.sf.net