You can use a combination of ElementTree's fromstring() method and the requests module's requests.get() to accomplish this.
https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml
fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree.
Install the requests module:
pip install requests
Use the requests.get() to get your xml file from the url as a string. Pass that into the fromstring() function.
import xml.etree.cElementTree as ET
import requests
tree = ET.fromstring(requests.get('http://synd.cricbuzz.com/j2me/1.0/livematches.xml').text)
for child in tree:
print("%s - %s"%(child.get('srs'),child.get('mchDesc')))
Results:
None - None
India tour of Sri Lanka, 2015 - Cricbuzz Cup - SL vs IND
Australia tour of Ireland, 2015 - IRE vs AUS
New Zealand tour of South Africa, 2015 - RSA vs NZ
Royal London One-Day Cup, 2015 - SUR vs KENT
Royal London One-Day Cup, 2015 - ESS vs YORKS
Answer from Joe Young on Stack OverflowYou can parse the text as a string, which creates an Element, and create an ElementTree using that Element.
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))
I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.
If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.
See xml.etree.ElementTree
Before I try to answer, a tip. Your exception handler covers up the nature of the problem. Just let the original exception rise up and you'll have more information to share with people who are interested in helping you.
I like to use feedparser to parse Atom feeds. It does indeed give you dict-like objects. I submitted a patch to feedparser 4.1 to parse the GeoRSS elements into GeoJSON style dicts. See https://code.google.com/p/feedparser/issues/detail?id=62 and blog post at http://sgillies.net/blog/566/georss-patch-for-universal-feedparser/. You'd use it like this:
>>> import feedparser
>>> feed = feedparser.parse("http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")
>>> feed.entries[0]['where']
{'type': 'Point', 'coordinates': (-122.8282, 38.844700000000003)}
My patched version of 4.1 is in my Dropbox and you can get it using pip.
$ pip install http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz
Or just download and install with "python setup.py install".
It's more comfortable to use lxml for XML processing. Here is an example that fetches the feed and prints earthquake titles and coordinates:
import lxml.etree
feed_url = 'http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml'
ns = {
'atom': 'http://www.w3.org/2005/Atom',
'georss': 'http://www.georss.org/georss',
}
def main():
doc = lxml.etree.parse(feed_url)
for entry in doc.xpath('//atom:entry', namespaces=ns):
[title] = entry.xpath('./atom:title', namespaces=ns)
[point] = entry.xpath('./georss:point', namespaces=ns)
print point.text, title.text
if __name__ == '__main__':
main()
From ElementTree docs:
We can import this data by reading from a file:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Or directly from a string:
root = ET.fromstring(country_data_as_string)
and later in the same page, 20.5.1.4. Finding interesting elements:
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
Which translate to:
import xml.etree.ElementTree as ET
root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
<F LV="0">The quick</F>
<F LV="1">brown</F>
<F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
<F LV="0">The lazy</F>
<F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
print (h.attrib["D"])
for f in root.iter("F"):
print (f.attrib, f.text)
output:
14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox
You did not specifiy what exactly you whant to use so i recommend lxml for python. For getting the values you whant you have more possibiltys:
With a loop:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
text.append(element.get('D',None))
for child in element:
for grandchild in child:
text.append(grandchild.text)
print(text)
Output: ['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']
With xpath:
from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
D = root.xpath("./H")
F = root.xpath(".//F")
for each in D:
print(each.get('D',None))
for each in F:
print(each.text)
Output: 14/11/2017 14/11/2017 The quick brown fox The lazy fox
Both have there own advantages but give you a good starting point. I recommend the xpath since it gives you more freedom when values are missing.
You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
Also see the Parsing XML with Namespaces section of the ElementTree documentation.
As of Python 3.8, the ElementTree library also understands the {*} namespace wildcard, so root.findall('{*}Class') would also work (but don't do that if your document can have multiple namespaces that define the Class element).
If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.
Here's how to do this with lxml without having to hard-code the namespaces or scan the text for them (as Martijn Pieters mentions):
from lxml import etree
tree = etree.parse("filename")
root = tree.getroot()
root.findall('owl:Class', root.nsmap)
UPDATE:
5 years later I'm still running into variations of this issue. lxml helps as I showed above, but not in every case. The commenters may have a valid point regarding this technique when it comes merging documents, but I think most people are having difficulty simply searching documents.
Here's another case and how I handled it:
<?xml version="1.0" ?><Tag1 xmlns="http://www.mynamespace.com/prefix">
<Tag2>content</Tag2></Tag1>
xmlns without a prefix means that unprefixed tags get this default namespace. This means when you search for Tag2, you need to include the namespace to find it. However, lxml creates an nsmap entry with None as the key, and I couldn't find a way to search for it. So, I created a new namespace dictionary like this
namespaces = {}
# response uses a default namespace, and tags don't mention it
# create a new ns map using an identifier of our choice
for k,v in root.nsmap.iteritems():
if not k:
namespaces['myprefix'] = v
e = root.find('myprefix:Tag2', namespaces)
ElementTree can be tricky when namespaces are involved. The element you are looking for are named <gml:lowerCorner> and <gml:upperCorner>. Searching higher in the XML data, gml is defined as an XML namespace: xmlns:gml="http://www.opengis.net/gml". The way to find a subelement of the XML tree is as follows:
from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
print tree.find('//{http://www.opengis.net/gml}lowerCorner').text
print tree.find('//{http://www.opengis.net/gml}upperCorner').text
Output
137796 483752
138178 484222
Explanation
Using ElementTree's XPath support, // selects all subelements on all levels of the tree. ElementTree uses {url}tag notation for a tag in a specific namespace. gml's URL is http://www.opengis.net/gml. .text retrieves the data in the element.
Note that // is a shortcut to finding a nested node. The full path of upperCorner in ElementTree's syntax is actually:
{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}Pngformaat/{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}OmsluitendeRechthoek/{http://www.opengis.net/gml}Envelope/{http://www.opengis.net/gml}upperCorner
Using ElementTree is very simple, basically you create an object parsed from a file, find elements by name or path, and get their text or attribute.
In your case it's a bit more complicated because you have namespaces in your file, so we have to transform the path from the form ns:tag to the form {uri}tag. This the aim of the transform_path function
NS_MAP = {
'http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo' : 'lev',
'http://www.opengis.net/gml' : 'gml',
}
INV_NS_MAP = {v:k for k, v in NS_MAP.items()} #inverse ns_map
#for python2: INV_NS_MAP = dict((v,k) for k, v in NS_MAP.iteritems())
#ElementTree expect tags in form {uri}tag, but it would be a pain to have complete uri for eache tag
def transform_path (path):
res = ''
tags = path.split('/')
for tag in tags:
ns, tag = tag.split(':')
res += "{"+INV_NS_MAP[ns]+"}"+tag+'/'
return res
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
doc = tree.getroot()
lowerCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:lowerCorner"))
upperCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:upperCorner"))
print (lowerCorner.text) # Print coordinates
print (upperCorner.text) # Print coordinates
#for python2: print elem.text
Running the script with you file will give the following output:
137796 483752
138178 484222