Try this:
from xml.etree import ElementTree as ET
xml = """<root>
<item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
<item>To Grandmother's <ref id="house" /> we go.</item>
</root>"""
root = ET.fromstring(xml)
for item in root:
if item.text:
print(item.text)
for ref in item:
print(ref)
if ref.tail:
print(ref.tail)
ElementTrees representation of "mixed content" is based on .text and .tail attributes. The .text of an element represents the text of the element up to the first child element. That child's .tail then contains the text of its parent following it. See the API doc.
Videos
You can iterate over all the "visit" tags directly under an element "element" like this:
for x in element.iter("visit"):
You can find the first direct child of element matching a certain tag with:
element.find( "visits" )
It looks like you will first have to locate the "visits" element, which is the parent of "visit", and then iterate through its "visit" children. Putting those together you'd have something like this:
for patient_element in root:
print patient_element.tag
visits_element = patient_element.find( "visits" )
for visit_element in visits_element.iter("visit"):
print visit_element.tag, visit_element.text
# ... further processing of each visit element here
In general look at the section "Finding interesting elements" in the documentation for xml.etree.ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
This is untested by it should be fairly close to what you want.
for patient in root:
patient_code = patient.find('PatientCharacteristics').find('patientCode')
if patient_code.text == code:
for visit in patient.find('Visits'):
visit_date = visit.find('VisitDate')
if visit_date.text == date:
swol28 = visit.find('DAS').find('Joints').find('SWOL28')
if swol28.text:
visit.find('DAS').find('Joints').set('SWOL28', new_swol28)
To iterate over all nodes, use the iter method on the ElementTree, not the root Element.
The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree has the context for all Elements.
For example, given this xml
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
You can do the following
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
... print elem
...
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>
Adding to Robert Christie's answer it is possible to iterate over all nodes using fromstring() by converting the Element to an ElementTree:
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(xml_string))
for elt in e.iter():
print "%s: '%s'" % (elt.tag, elt.text)
xml.ElementTree.Element supports the iterator protocol, so you can use list(elem) as follows:
import xml.etree.cElementTree as ET
s = '''
<Node1>
<Node11>
<Node21>
</Node21>
<Node22>
</Node22>
<Node23>
</Node23>
</Node11>
<Node12>
</Node12>
<Node13>
</Node13>
</Node1>
'''
root = ET.fromstring(s)
print root
print list(root)
There are two ways you can go about handling the text nodes. If you really want to keep using dom, you can get rid of the text nodes with a filter:
>>> filter(lambda node: node.nodeType != xml.dom.Node.TEXT_NODE, myNode.childNodes)
[<DOM Element: Node11 at 0x18e64d0>, <DOM Element: Node12 at 0x18e6950>, <DOM Element: Node13 at 0x18e6a70>]
or a list comprehension:
>>> [x for x in myNode.childNodes if x.nodeType != xml.dom.Node.TEXT_NODE]
[<DOM Element: Node11 at 0x18e64d0>, <DOM Element: Node12 at 0x18e6950>, <DOM Element: Node13 at 0x18e6a70>]
If you don't need to keep using dom, I would suggest using ElementTree as Eli Bendersky suggested.