The namespace of an XML document is significant. ElementTree requires tags to be fully qualified to find the right element. Here's an example of three elements with the same tag in different namespaces:
data = '''\
<root xmlns="xyz" xmlns:name="abc">
<object name="one" />
<name:object name="two" />
<object xmlns="def" name="three" />
</root>
'''
Here's the elements that ElementTree sees:
>>> from xml.etree import ElementTree as et
>>> tree = et.fromstring(data)
>>> print(tree.findall('.//*'))
>>> et.dump(tree)
[<Element '{xyz}object' at 0x0000000003B07BD8>,
<Element '{abc}object' at 0x0000000003B07C28>,
<Element '{def}object' at 0x0000000003B07C78>]
So you have it right. Given the default namespace definition of:
<entry xmlns='http://www.w3.org/2005/Atom' ...
To access the 'title' tag, which uses the default namespace:
media['title'] = e.findall('{http://www.w3.org/2005/Atom}title')
to access the 'media:group' tag, refer to the media namespace definition:
<entry ... xmlns:media='http://search.yahoo.com/mrss/' ...
And use:
e.findall('{http://search.yahoo.com/mrss/}group')
Note the different ways a namespace can be specified:
<root xmlns="xyz" xmlns:name="abc"> # default namespace and
# 'abc' namespace with id 'name'.
<object name="one" /> # Uses default namespace 'xyz'.
<name:object name="two" /> # uses 'abc' namespace (specified by id).
<object xmlns="def" name="three" /> # change the default namespace to 'def'.
</root>
To read a specific tag from a specific namespace:
>>> print(tree.find('{abc}object').attrib['name'])
'two'
Note the namespace IDs are just shortcuts. Here's what happens when you dump the parsed XML tree. ElementTree doesn't bother to save the original namespace IDs and generates its own in the format ns#:
>>> et.dump(tree)
<ns0:root xmlns:ns0="xyz" xmlns:ns1="abc" xmlns:ns2="def">
<ns0:object name="one" />
<ns1:object name="two" />
<ns2:object name="three" />
</ns0:root>
If you want specific shortcuts defined, use `register_namespace':
>>> et.register_namespace('','xyz') # default namespace
>>> et.register_namespace('name','abc')
>>> et.register_namespace('custom','def')
>>> et.dump(tree)
<root xmlns="xyz" xmlns:custom="def" xmlns:name="abc">
<object name="one" />
<name:object name="two" />
<custom:object name="three" />
</root>
Answer from Mark Tolonen on Stack OverflowThe namespace of an XML document is significant. ElementTree requires tags to be fully qualified to find the right element. Here's an example of three elements with the same tag in different namespaces:
data = '''\
<root xmlns="xyz" xmlns:name="abc">
<object name="one" />
<name:object name="two" />
<object xmlns="def" name="three" />
</root>
'''
Here's the elements that ElementTree sees:
>>> from xml.etree import ElementTree as et
>>> tree = et.fromstring(data)
>>> print(tree.findall('.//*'))
>>> et.dump(tree)
[<Element '{xyz}object' at 0x0000000003B07BD8>,
<Element '{abc}object' at 0x0000000003B07C28>,
<Element '{def}object' at 0x0000000003B07C78>]
So you have it right. Given the default namespace definition of:
<entry xmlns='http://www.w3.org/2005/Atom' ...
To access the 'title' tag, which uses the default namespace:
media['title'] = e.findall('{http://www.w3.org/2005/Atom}title')
to access the 'media:group' tag, refer to the media namespace definition:
<entry ... xmlns:media='http://search.yahoo.com/mrss/' ...
And use:
e.findall('{http://search.yahoo.com/mrss/}group')
Note the different ways a namespace can be specified:
<root xmlns="xyz" xmlns:name="abc"> # default namespace and
# 'abc' namespace with id 'name'.
<object name="one" /> # Uses default namespace 'xyz'.
<name:object name="two" /> # uses 'abc' namespace (specified by id).
<object xmlns="def" name="three" /> # change the default namespace to 'def'.
</root>
To read a specific tag from a specific namespace:
>>> print(tree.find('{abc}object').attrib['name'])
'two'
Note the namespace IDs are just shortcuts. Here's what happens when you dump the parsed XML tree. ElementTree doesn't bother to save the original namespace IDs and generates its own in the format ns#:
>>> et.dump(tree)
<ns0:root xmlns:ns0="xyz" xmlns:ns1="abc" xmlns:ns2="def">
<ns0:object name="one" />
<ns1:object name="two" />
<ns2:object name="three" />
</ns0:root>
If you want specific shortcuts defined, use `register_namespace':
>>> et.register_namespace('','xyz') # default namespace
>>> et.register_namespace('name','abc')
>>> et.register_namespace('custom','def')
>>> et.dump(tree)
<root xmlns="xyz" xmlns:custom="def" xmlns:name="abc">
<object name="one" />
<name:object name="two" />
<custom:object name="three" />
</root>
Actually I have tried the following way using xml.dom.minidom, Just in case it helps you anyway.
#!/usr/bin/python
from xml.dom.minidom import parseString
import re
import urllib
def get_video_id(video_url):
return re.search(r'watch\?v=.*', video_url).group(0)[8:]
def get_video_feed(video_url):
video_feed = "http://gdata.youtube.com/feeds/api/videos/" + get_video_id(video_url)
print video_feed
return urllib.urlopen(video_feed).read()
def get_media_info(video_url):
content = get_video_feed(video_url)
dom = parseString(content)
media = {}
media['title'] = dom.getElementsByTagName('title')[0].firstChild.nodeValue
return media
def main():
video_url = 'http://youtube.com/watch?v=q5sOLzEerwA'
print ( get_media_info(video_url) )
if __name__ == '__main__':
main()
xml - Python getting element value for specific element - Stack Overflow
Navigating and extracting XML in Python
Xml - Find Element By tag using Python - Stack Overflow
parse a specific element in a xml file using Python - Stack Overflow
Videos
I am trying to understand how to use XML with python, so I've created structure to test: http://pastebin.com/x0cvRA8V and I can't grasp how to get lets say value of mindmg for 'Mace' object.
I've started to read documentation, but I can't reach how to list/get those attributes. I could list that mace, and short spear got name, but I cannot list that it has mindmg and maxdmg. Also how to select root by criterium.
You can access the attribute value as this:
from elementtree.ElementTree import XML, SubElement, Element, tostring
text = """
<root>
<phoneNumbers>
<number topic="sys/phoneNumber/1" update="none" />
<number topic="sys/phoneNumber/2" update="none" />
<number topic="sys/phoneNumber/3" update="none" />
</phoneNumbers>
<gfenSMSnumbers>
<number topic="sys2/SMSnumber/1" update="none" />
<number topic="sys2/SMSnumber/2" update="none" />
</gfenSMSnumbers>
</root>
"""
elem = XML(text)
for node in elem.find('phoneNumbers'):
print node.attrib['topic']
# Create sub elements
if node.attrib['topic']=="sys/phoneNumber/1":
tag = SubElement(node,'TagName')
tag.attrib['attr'] = 'AttribValue'
print tostring(elem)
forget to say, if your ElementTree version is greater than 1.3, you can use XPath:
elem.find('.//number[@topic="sys/phoneNumber/1"]')
http://effbot.org/zone/element-xpath.htm
or you can use this simple one:
for node in elem.findall('.//number'):
if node.attrib['topic']=="sys/phoneNumber/1":
tag = SubElement(node,'TagName')
tag.attrib['attr'] = 'AttribValue'
For me this Elementtree snipped of code worked to find element by attribute:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
topic=root.find(".//*[@topic='sys/phoneNumber/1']").text
How would I go about finding all elements whose attribute 'id' contain a certain value?
e.g., the id's values vary and can be e.g. 'foo 1' 'foo 23' 'foo 9'
How do I go about getting all the elements whose id contains the word foo disregarding whichever number it is followed by?
The previous posters have the right of it. The etree documentation can be found here:
https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree
And can help you out. Here's a code sample that might do the trick (partially taken from the above link):
Copyimport xml.etree.ElementTree as ET
tree = ET.parse('your_file.xml')
root = tree.getroot()
for group in root.findall('group'):
title = group.find('title')
titlephrase = title.find('phrase').text
for doc in group.findall('document'):
refid = doc.get('refid')
Or if you want the ID stored in the group tag, you'd use id = group.get('id') instead of searching for all the refids.
Did you have a look at Python's XML etree parser? There are plenty of examples on the web.
With lxml:
import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
textelem = root.find('result/field/value/text')
print textelem.text
Edit: But I imagine there could be more than one result...
import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
results = root.findall('result')
textnumbers = [r.find('field/value/text').text for r in results]
BeautifulSoup is the most simple way to parse XML as far as I know...
And assume that you have read the introduction, then just simply use:
soup = BeautifulSoup('your_XML_string')
print soup.find('text').string