Videos
lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.
>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'
You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:
# geetree.py
import xml.etree.ElementTree as ET
class GeeElem(object):
"""Wrapper around an ElementTree element. a['foo'] gets the
attribute foo, a.foo gets the first subelement foo."""
def __init__(self, elem):
self.etElem = elem
def __getitem__(self, name):
res = self._getattr(name)
if res is None:
raise AttributeError, "No attribute named '%s'" % name
return res
def __getattr__(self, name):
res = self._getelem(name)
if res is None:
raise IndexError, "No element named '%s'" % name
return res
def _getelem(self, name):
res = self.etElem.find(name)
if res is None:
return None
return GeeElem(res)
def _getattr(self, name):
return self.etElem.get(name)
class GeeTree(object):
"Wrapper around an ElementTree."
def __init__(self, fname):
self.doc = ET.parse(fname)
def __getattr__(self, name):
if self.doc.getroot().tag != name:
raise IndexError, "No element named '%s'" % name
return GeeElem(self.doc.getroot())
def getroot(self):
return self.doc.getroot()
You invoke it so:
>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.
The available options for that are:
- ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
- cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5. Deprecated and folded into the regular ElementTree as an automatic thing as of 3.3.)
- LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)
Here's an example of how to generate your example document using the in-stdlib cElementTree:
import xml.etree.cElementTree as ET
root = ET.Element("root")
doc = ET.SubElement(root, "doc")
ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"
tree = ET.ElementTree(root)
tree.write("filename.xml")
I've tested it and it works, but I'm assuming whitespace isn't significant. If you need "prettyprint" indentation, let me know and I'll look up how to do that. (It may be an LXML-specific option. I don't use the stdlib implementation much)
For further reading, here are some useful links:
- API docs for the implementation in the Python standard library
- Introductory Tutorial (From the original author's site)
- LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)
As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you're in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:
- LXML clearly wins for serializing (generating) XML
- As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.
The lxml library includes a very convenient syntax for XML generation, called the E-factory. Here's how I'd make the example you give:
#!/usr/bin/python
import lxml.etree
import lxml.builder
E = lxml.builder.ElementMaker()
ROOT = E.root
DOC = E.doc
FIELD1 = E.field1
FIELD2 = E.field2
the_doc = ROOT(
DOC(
FIELD1('some value1', name='blah'),
FIELD2('some value2', name='asdfasd'),
)
)
print lxml.etree.tostring(the_doc, pretty_print=True)
Output:
<root>
<doc>
<field1 name="blah">some value1</field1>
<field2 name="asdfasd">some value2</field2>
</doc>
</root>
It also supports adding to an already-made node, e.g. after the above you could say
the_doc.append(FIELD2('another value again', name='hithere'))
So I have ElementTree 1.2.6 on my box now, and ran the following code against the XML chunk you posted:
import elementtree.ElementTree as ET
tree = ET.parse("test.xml")
doc = tree.getroot()
thingy = doc.find('timeSeries')
print thingy.attrib
and got the following back:
{'name': 'NWIS Time Series Instantaneous Values'}
It appears to have found the timeSeries element without needing to use numerical indices.
What would be useful now is knowing what you mean when you say "it doesn't work." Since it works for me given the same input, it is unlikely that ElementTree is broken in some obvious way. Update your question with any error messages, backtraces, or anything you can provide to help us help you.
If I understand your question correctly:
for elem in doc.findall('timeSeries/values/value'):
print elem.get('dateTime'), elem.text
or if you prefer (and if there is only one occurrence of timeSeries/values:
values = doc.find('timeSeries/values')
for value in values:
print value.get('dateTime'), elem.text
The findall() method returns a list of all matching elements, whereas find() returns only the first matching element. The first example loops over all the found elements, the second loops over the child elements of the values element, in this case leading to the same result.
I don't see where the problem with not finding timeSeries comes from however. Maybe you just forgot the getroot() call? (note that you don't really need it because you can work from the elementtree itself too, if you change the path expression to for example /timeSeriesResponse/timeSeries/values or //timeSeries/values)