Package lxml :: Package html :: Module diff
[frames] | no frames]

Module diff

source code

Functions
 
html_annotate(doclist, markup=<function default_markup at 0x893d95c>)
doclist should be ordered from oldest to newest, like:
source code
 
htmldiff(old_html, new_html)
Do a diff of the old and new document.
source code
Function Details

html_annotate(doclist, markup=<function default_markup at 0x893d95c>)

source code 

doclist should be ordered from oldest to newest, like:

>>> version1 = 'Hello World'
>>> version2 = 'Goodbye World'
>>> html_annotate([(version1, 'version 1'),
...                (version2, 'version 2')])
u'<span title="version 2">Goodbye</span> <span title="version 1">World</span>'

The documents must be fragments (str/UTF8 or unicode), not complete documents

The markup argument is a function to markup the spans of words. This function is called like markup('Hello', 'version 2'), and returns HTML. The first argument is text and never includes any markup. The default uses a span with a title:

>>> default_markup('Some Text', 'by Joe')
u'<span title="by Joe">Some Text</span>'

htmldiff(old_html, new_html)

source code 

Do a diff of the old and new document. The documents are HTML fragments (str/UTF8 or unicode), they are not complete documents (i.e., no <html> tag).

Returns HTML with <ins> and <del> tags added around the appropriate text.

Markup is generally ignored, with the markup from new_html preserved, and possibly some markup from old_html (though it is considered acceptable to lose some of the old markup). Only the words in the HTML are diffed. The exception is <img> tags, which are treated like words, and the href attribute of <a> tags, which are noted inside the tag itself when there are changes.