lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.
>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'
Answer from Ryan Ginstrom on Stack OverflowReally simple way to deal with XML in Python? - Stack Overflow
beautifulsoup - For web scraping and xml parsing, which is best library to learn - Stack Overflow
XML Processing in Python - Stack Overflow
Python Manipulate and save XML without third-party libraries - Stack Overflow
Videos
Which are the best libraries out there to manage XML files? There's a prticular one that you usually use? Why?
https://lxml.de/
In the standard library, ElementTree works fine if you just want to extract data from XML.
If you want to edit and save XML use LXML. The standard library implementations are incomplete and can't replicate some constructs. For example they move all namespace tags to the root node and will save XML that contains forbidden other characters.
Also consider defusedxml which monkeypatches the standard library against malformed XML.
lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.
>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'
You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:
# geetree.py
import xml.etree.ElementTree as ET
class GeeElem(object):
"""Wrapper around an ElementTree element. a['foo'] gets the
attribute foo, a.foo gets the first subelement foo."""
def __init__(self, elem):
self.etElem = elem
def __getitem__(self, name):
res = self._getattr(name)
if res is None:
raise AttributeError, "No attribute named '%s'" % name
return res
def __getattr__(self, name):
res = self._getelem(name)
if res is None:
raise IndexError, "No element named '%s'" % name
return res
def _getelem(self, name):
res = self.etElem.find(name)
if res is None:
return None
return GeeElem(res)
def _getattr(self, name):
return self.etElem.get(name)
class GeeTree(object):
"Wrapper around an ElementTree."
def __init__(self, fname):
self.doc = ET.parse(fname)
def __getattr__(self, name):
if self.doc.getroot().tag != name:
raise IndexError, "No element named '%s'" % name
return GeeElem(self.doc.getroot())
def getroot(self):
return self.doc.getroot()
You invoke it so:
>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
» pip install xmlschema
Personally, I've played with several of the built-in options on an XML-heavy project and have settled on pulldom as the best choice for less complex documents.
Especially for small simple stuff, I like the event-driven theory of parsing rather than setting up a whole slew of callbacks for a relatively simple structure. Here is a good quick discussion of how to use the API.
What I like: you can handle the parsing in a for loop rather than using callbacks. You also delay full parsing (the "pull" part) and only get additional detail when you call expandNode(). This satisfies my general requirement for "responsible" efficiency without sacrificing ease of use and simplicity.
ElementTree has a nice pythony API. I think it's even shipped as part of python 2.5
It's in pure python and as I say, pretty nice, but if you wind up needing more performance, then lxml exposes the same API and uses libxml2 under the hood. You can theoretically just swap it in when you discover you need it.
The best option from the standard lib is (I think) the xml.etree package.
Assuming that your example tag occurs only once somewhere in the document:
import xml.etree.ElementTree as etree
# or for a faster C implementation
# import xml.etree.cElementTree as etree
tree = etree.parse('input.xml')
elem = tree.find('//tag-Name') # finds the first occurrence of element tag-Name
elem.text = 'newName'
tree.write('output.xml')
Or if there are multiple occurrences of tag-Name, and you want to change them all if they have "oldName" as content:
import xml.etree.cElementTree as etree
tree = etree.parse('input.xml')
for elem in tree.findall('//tag-Name'):
if elem.text == 'oldName':
elem.text = 'newName'
# some output options for example
tree.write('output.xml', encoding='utf-8', xml_declaration=True)
Python has 'builtin' libraries for working with xml. For this simple task I'd look into minidom. You can find the docs here:
http://docs.python.org/library/xml.dom.minidom.html