Projects for Crowd Funding

This is a list of project proposals that are currently open for crowd funding. If you have other ideas that are not listed here yet, please speak up on the mailing list. To get more information about the projects below, either ask on the mailing list or contact Stefan Behnel.

Each proposal below is preceded by a project name that you can use as a project reference when sending money via PayPal. See the home page on how to make payments in general. By assigning a valid project name to your payment, instead of making a normal donation, you are dedicating it explicitly for that proposal. Note that invalid project names will be ignored and the payment will be considered a normal donation to the lxml project in that case. You can also name more than one project, separated by commas, in order to reassign your payment if your favourite project turns out to have already been completed.

Once a substantial part of the money that is required for a project implementation has been collected, work on this project will be started as soon as possible. The status of a running project will be communicated through the lxml project home page and the lxml mailing list.

In case the sum of the project specific payments, summed up in the order of their reception, overshoots the final price of the implementation of that proposal, the overpayments will normally be kept as a donation for the general development on lxml. If, instead, you really want to restrict your payment to the named project and not open it for general development of the lxml library, you can request a refund in the case of overpayments by adding the word "refund" to your payment message. Note that we will not refund any fees or charges that PayPal takes for either your original payment or our refunding payment.

Really simple projects

nsassign
  • status: open
  • what: add 'nsmap' parameter to cleanup_namespaces() to make it use a specific prefix-namespace mapping, i.e. reassign specific prefixes to namespace URIs

Projects taking just a few days

userdtd
  • status: open
  • what: parse-time validation against a user provided DTD
  • currently only works for XML Schema
  • how: register
zlibmem
  • status: open
  • what: zlib-based parsing/serialising of compressed in-memory data
  • how: requires a libxml2 I/O OutputBuffer with appropriate I/O functions that call into the zlib compression routines
xmlgen
htmlenc
  • status: open
  • what: provide an HTMLParser wrapper that handles broken encodings in broken HTML better, e.g. using BeautifulSoup's "unicode dammit" charset analyser

Projects taking more than a week

lzma
  • status: open
  • what: lzma-based parsing/serialising of compressed in-memory and file data
  • how: requires a libxml2 I/O OutputBuffer with appropriate I/O functions that call into the lzma compression routines
  • advantage over zlib: probably faster and better compression
  • maybe embed the lzma C sources in the distro http://www.7-zip.org/sdk.html

Projects in pre-design phase

Here are some project proposals that cannot currently be funded directly due to a lack of design consideration, so no realistic estimate about the amount of work they require can be made. If you are interested in any of these projects, please contact either the lxml mailing list of Stefan Behnel.

Note: If you dedicate payments to the lxml project to any of the following projects in their current state, the dedication will be ignored and the payment will be considered a general donation to the lxml project.

rnc
  • status: open
  • what: somehow integrate RelaxNG compact notation (rnc versus rng)
  • currently not supported by libxml2 (patch exists)
  • how: either integrate the libxml2 RNC patch or rewrite the mapping in Python
iterparse-rewrite
  • status: open
  • what: reimplement iterparse() using the libxml2 xmlReader API
  • maybe: let iterparse() accept a parser as argument instead of being one
  • Advantage: the implementation can be made safer than the current SAX implementation, as the parser would no longer interact with the tree that is user modifiable at the Python-level.
  • Disadvantage: the tree has to be built manually. In the current SAX based implementation, libxml2 does it for us.