I suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree defines.
First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:
import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()
Or any of the many other ways shown at ElementTree. Then do something like:
for type_tag in root.findall('bar/type'):
value = type_tag.get('foobar')
print(value)
Output:
1
2
Answer from Alex Martelli on Stack OverflowI suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree defines.
First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:
import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()
Or any of the many other ways shown at ElementTree. Then do something like:
for type_tag in root.findall('bar/type'):
value = type_tag.get('foobar')
print(value)
Output:
1
2
minidom is the quickest and pretty straight forward.
XML:
<data>
<items>
<item name="item1"></item>
<item name="item2"></item>
<item name="item3"></item>
<item name="item4"></item>
</items>
</data>
Python:
from xml.dom import minidom
dom = minidom.parse('items.xml')
elements = dom.getElementsByTagName('item')
print(f"There are {len(elements)} items:")
for element in elements:
print(element.attributes['name'].value)
Output:
There are 4 items:
item1
item2
item3
item4
Parse XML files in Python (ElementTree) - Geographic Information Systems Stack Exchange
How to process / parse XML with Python
A Roadmap to XML Parsers in Python – Real Python
Can someone explain to me how to parse an XML file?
Videos
Before I try to answer, a tip. Your exception handler covers up the nature of the problem. Just let the original exception rise up and you'll have more information to share with people who are interested in helping you.
I like to use feedparser to parse Atom feeds. It does indeed give you dict-like objects. I submitted a patch to feedparser 4.1 to parse the GeoRSS elements into GeoJSON style dicts. See https://code.google.com/p/feedparser/issues/detail?id=62 and blog post at http://sgillies.net/blog/566/georss-patch-for-universal-feedparser/. You'd use it like this:
>>> import feedparser
>>> feed = feedparser.parse("http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")
>>> feed.entries[0]['where']
{'type': 'Point', 'coordinates': (-122.8282, 38.844700000000003)}
My patched version of 4.1 is in my Dropbox and you can get it using pip.
$ pip install http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz
Or just download and install with "python setup.py install".
It's more comfortable to use lxml for XML processing. Here is an example that fetches the feed and prints earthquake titles and coordinates:
import lxml.etree
feed_url = 'http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml'
ns = {
'atom': 'http://www.w3.org/2005/Atom',
'georss': 'http://www.georss.org/georss',
}
def main():
doc = lxml.etree.parse(feed_url)
for entry in doc.xpath('//atom:entry', namespaces=ns):
[title] = entry.xpath('./atom:title', namespaces=ns)
[point] = entry.xpath('./georss:point', namespaces=ns)
print point.text, title.text
if __name__ == '__main__':
main()
Hello, could you please recommend me a book or tutorial explaining how to process an XML? i 've one xml with odd format so I need to master the xml modules in order to parse the XML correctly. Thanks.