libxml2 has a number of advantages:
- Compliance to the spec
- Active development and a community participation
- Speed. This is really a python wrapper around a C implementation.
- Ubiquity. The libxml2 library is pervasive and thus well tested.
Downsides include:
- Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.
- Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
- Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.
If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.
Sample of libxml2 XPath Use
import libxml2
doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
print "xpath query: wrong node set size"
sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
print "xpath query: wrong node set value"
sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()
Sample of ElementTree XPath Use
from elementtree.ElementTree import ElementTree
mydoc = ElementTree(file='tst.xml')
for e in mydoc.findall('/foo/bar'):
print e.get('title').text
Answer from Ryan Cox on Stack Overflow
libxml2 has a number of advantages:
- Compliance to the spec
- Active development and a community participation
- Speed. This is really a python wrapper around a C implementation.
- Ubiquity. The libxml2 library is pervasive and thus well tested.
Downsides include:
- Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.
- Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
- Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.
If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.
Sample of libxml2 XPath Use
import libxml2
doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
print "xpath query: wrong node set size"
sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
print "xpath query: wrong node set value"
sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()
Sample of ElementTree XPath Use
from elementtree.ElementTree import ElementTree
mydoc = ElementTree(file='tst.xml')
for e in mydoc.findall('/foo/bar'):
print e.get('title').text
The lxml package supports xpath. It seems to work pretty well, although I've had some trouble with the self:: axis. There's also Amara, but I haven't used it personally.
Videos
http://docs.python.org/library/xml.etree.elementtree.html
etree supports XPath queries, just like lxml.
etree is included in the standard library, but lxml is faster.
My favorite XML processing library for Python is lxml which, because it is a wrapper around libxml2, also supports full XPath.
There is also 4Suite which is more of a pure Python solution.