You can use a combination of ElementTree's fromstring() method and the requests module's requests.get() to accomplish this.

https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree.

Install the requests module:

pip install requests

Use the requests.get() to get your xml file from the url as a string. Pass that into the fromstring() function.

import xml.etree.cElementTree as ET
import requests
tree = ET.fromstring(requests.get('http://synd.cricbuzz.com/j2me/1.0/livematches.xml').text)
for child in tree:
   print("%s - %s"%(child.get('srs'),child.get('mchDesc')))

Results:

None - None
India tour of Sri Lanka, 2015 - Cricbuzz Cup - SL vs IND
Australia tour of Ireland, 2015 - IRE vs AUS
New Zealand tour of South Africa, 2015 - RSA vs NZ
Royal London One-Day Cup, 2015 - SUR vs KENT
Royal London One-Day Cup, 2015 - ESS vs YORKS
Answer from Joe Young on Stack Overflow
🌐
Python
docs.python.org › 3 › library › xml.etree.elementtree.html
xml.etree.ElementTree — The ElementTree XML API
January 29, 2026 - Source code: Lib/xml/etree/ElementTree.py The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Tutorial: This is a short tutorial for using xml....
🌐
lxml
lxml.de › tutorial.html
The lxml.etree Tutorial
Serialisation commonly uses the tostring() function that returns a string, or the ElementTree.write() method that writes to a file, a file-like object, or a URL (via FTP PUT or HTTP POST).
🌐
Stack Overflow
stackoverflow.com › questions › 61667873 › python-parsing-xml-data-with-elementtree
Python - Parsing XML data with ElementTree - Stack Overflow
Below uses the built-in urllib moduel to parse XML from URL: Copyfrom urllib.request import urlopen import xml.etree.ElementTree as ET def vatbook_parse(url): with urlopen(url) as f: tree = ET.parse(f) root = tree.getroot() # CONDITIONALLY SET SEARCH PATH path = './/atcs/booking' if tree.find('atc') is None else './/atc' for atcs in root.iterfind(path): callsign = atcs.find('callsign') name = atcs.find('name') time_start = atcs.find('time_start') time_end = atcs.find('time_end') if callsign is not None: print(f"{name.text} booked {callsign.text} from {time_start.text} to {time_end.text}") First URL ·
🌐
Stack Overflow
stackoverflow.com › questions › 72519781 › how-to-parse-xml-with-xml-etree-elementtree
python - How to parse XML with xml.Etree.ElementTree? - Stack Overflow
Copyimport xml.etree.ElementTree as ET # efetch function is from a module which queries pubmed API. Content of response of the API is a XML you can find below in the second code snippet.
🌐
Python Forum
python-forum.io › thread-14239.html
XML parsing from URL
Hello I started my trek into Python a few days ago. I am receiving the following error: Quote:Please enter an XML URL to parse: http://py4e-data.dr-chuck.net/comments_42.xml Traceback (most recent call last): File '/home/lamidotijjo/Documents/Pyth...
Find elsewhere
Top answer
1 of 4
6

Before I try to answer, a tip. Your exception handler covers up the nature of the problem. Just let the original exception rise up and you'll have more information to share with people who are interested in helping you.

I like to use feedparser to parse Atom feeds. It does indeed give you dict-like objects. I submitted a patch to feedparser 4.1 to parse the GeoRSS elements into GeoJSON style dicts. See https://code.google.com/p/feedparser/issues/detail?id=62 and blog post at http://sgillies.net/blog/566/georss-patch-for-universal-feedparser/. You'd use it like this:

>>> import feedparser
>>> feed = feedparser.parse("http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")
>>> feed.entries[0]['where']
{'type': 'Point', 'coordinates': (-122.8282, 38.844700000000003)}

My patched version of 4.1 is in my Dropbox and you can get it using pip.

$ pip install http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz

Or just download and install with "python setup.py install".

2 of 4
2

It's more comfortable to use lxml for XML processing. Here is an example that fetches the feed and prints earthquake titles and coordinates:

import lxml.etree

feed_url = 'http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml'
ns = {
    'atom': 'http://www.w3.org/2005/Atom',
    'georss': 'http://www.georss.org/georss',
}

def main():
    doc = lxml.etree.parse(feed_url)
    for entry in doc.xpath('//atom:entry', namespaces=ns):
        [title] = entry.xpath('./atom:title', namespaces=ns)
        [point] = entry.xpath('./georss:point', namespaces=ns)
        print point.text, title.text

if __name__ == '__main__':
    main()
🌐
Python Module of the Week
pymotw.com › 3 › xml.etree.ElementTree › parse.html
Parsing an XML Document — PyMOTW 3
August 6, 2016 - Parsing an entire document with parse() returns an ElementTree instance. The tree knows about all of the data in the input document, and the nodes of the tree can be searched or manipulated in place.
🌐
Read the Docs
stackless.readthedocs.io › en › 3.7-slp › library › xml.etree.elementtree.html
xml.etree.ElementTree — The ElementTree XML API — Stackless-Python 3.7.9 documentation
August 8, 2021 - This default loader reads an included resource from disk. href is a URL. parse is for parse mode either “xml” or “text”. encoding is an optional text encoding. If not given, encoding is utf-8. Returns the expanded resource. If the parse mode is "xml", this is an ElementTree instance.
Top answer
1 of 3
13

From ElementTree docs:

We can import this data by reading from a file:

import xml.etree.ElementTree as ET

tree = ET.parse('country_data.xml')
root = tree.getroot()

Or directly from a string:

root = ET.fromstring(country_data_as_string)

and later in the same page, 20.5.1.4. Finding interesting elements:

for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)

Which translate to:

import xml.etree.ElementTree as ET

root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
    <F LV="0">The quick</F>
    <F LV="1">brown</F>
    <F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
    <F LV="0">The lazy</F>
    <F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
    print (h.attrib["D"])
for f in root.iter("F"):
    print (f.attrib, f.text)

output:

14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox
2 of 3
4

You did not specifiy what exactly you whant to use so i recommend lxml for python. For getting the values you whant you have more possibiltys:

With a loop:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
   text.append(element.get('D',None))
     for child in element:
       for grandchild in child:
         text.append(grandchild.text)
print(text)

Output: ['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']

With xpath:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot() 
D = root.xpath("./H")
F = root.xpath(".//F")

for each in D:
  print(each.get('D',None))

for each in F:
  print(each.text)

Output: 14/11/2017 14/11/2017 The quick brown fox The lazy fox

Both have there own advantages but give you a good starting point. I recommend the xpath since it gives you more freedom when values are missing.

Top answer
1 of 8
272

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:

namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed

root.findall('owl:Class', namespaces)

Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:

root.findall('{http://www.w3.org/2002/07/owl#}Class')

Also see the Parsing XML with Namespaces section of the ElementTree documentation.

As of Python 3.8, the ElementTree library also understands the {*} namespace wildcard, so root.findall('{*}Class') would also work (but don't do that if your document can have multiple namespaces that define the Class element).

If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

2 of 8
69

Here's how to do this with lxml without having to hard-code the namespaces or scan the text for them (as Martijn Pieters mentions):

from lxml import etree
tree = etree.parse("filename")
root = tree.getroot()
root.findall('owl:Class', root.nsmap)

UPDATE:

5 years later I'm still running into variations of this issue. lxml helps as I showed above, but not in every case. The commenters may have a valid point regarding this technique when it comes merging documents, but I think most people are having difficulty simply searching documents.

Here's another case and how I handled it:

<?xml version="1.0" ?><Tag1 xmlns="http://www.mynamespace.com/prefix">
<Tag2>content</Tag2></Tag1>

xmlns without a prefix means that unprefixed tags get this default namespace. This means when you search for Tag2, you need to include the namespace to find it. However, lxml creates an nsmap entry with None as the key, and I couldn't find a way to search for it. So, I created a new namespace dictionary like this

namespaces = {}
# response uses a default namespace, and tags don't mention it
# create a new ns map using an identifier of our choice
for k,v in root.nsmap.iteritems():
    if not k:
        namespaces['myprefix'] = v
e = root.find('myprefix:Tag2', namespaces)
Top answer
1 of 2
13

ElementTree can be tricky when namespaces are involved. The element you are looking for are named <gml:lowerCorner> and <gml:upperCorner>. Searching higher in the XML data, gml is defined as an XML namespace: xmlns:gml="http://www.opengis.net/gml". The way to find a subelement of the XML tree is as follows:

from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
print tree.find('//{http://www.opengis.net/gml}lowerCorner').text
print tree.find('//{http://www.opengis.net/gml}upperCorner').text

Output

137796 483752
138178 484222

Explanation

Using ElementTree's XPath support, // selects all subelements on all levels of the tree. ElementTree uses {url}tag notation for a tag in a specific namespace. gml's URL is http://www.opengis.net/gml. .text retrieves the data in the element.

Note that // is a shortcut to finding a nested node. The full path of upperCorner in ElementTree's syntax is actually:

{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}Pngformaat/{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}OmsluitendeRechthoek/{http://www.opengis.net/gml}Envelope/{http://www.opengis.net/gml}upperCorner
2 of 2
2

Using ElementTree is very simple, basically you create an object parsed from a file, find elements by name or path, and get their text or attribute.

In your case it's a bit more complicated because you have namespaces in your file, so we have to transform the path from the form ns:tag to the form {uri}tag. This the aim of the transform_path function

NS_MAP = {
    'http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo' : 'lev',
    'http://www.opengis.net/gml' : 'gml',
}
INV_NS_MAP = {v:k for k, v in NS_MAP.items()} #inverse ns_map 
#for python2: INV_NS_MAP = dict((v,k) for k, v in NS_MAP.iteritems())

#ElementTree expect tags in form {uri}tag, but it would be a pain to have complete uri for eache tag
def transform_path (path):
    res = ''
    tags = path.split('/')
    for tag in tags:
      ns, tag = tag.split(':')
      res += "{"+INV_NS_MAP[ns]+"}"+tag+'/'
    return res

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
doc = tree.getroot()

lowerCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:lowerCorner"))
upperCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:upperCorner"))
print (lowerCorner.text)         # Print coordinates
print (upperCorner.text)         # Print coordinates

#for python2: print elem.text

Running the script with you file will give the following output:

137796 483752
138178 484222
🌐
DataCamp
datacamp.com › tutorial › python-xml-elementtree
Python XML Tutorial: Element Tree Parse & Read | DataCamp
December 10, 2024 - Parse and read XML data with Element Tree Python package. Learn how to use xml.etree.elementtree and explore your data through XML today!
🌐
Python Module of the Week
pymotw.com › 2 › xml › etree › ElementTree › parse.html
Parsing XML Documents - Python Module of the Week
March 7, 2010 - The items returned by findall() and iter() are Element objects, each representing a node in the XML parse tree. Each Element has attributes for accessing data pulled out of the XML. This can be illustrated with a somewhat more contrived example input file, data.xml: The “attributes” of a node are available in the attrib property, which acts like a dictionary. from xml.etree import ElementTree with open('data.xml', 'rt') as f: tree = ElementTree.parse(f) node = tree.find('./with_attributes') print node.tag for name, value in sorted(node.attrib.items()): print ' %-4s = "%s"' % (name, value)
🌐
lxml
lxml.de › parsing.html
Parsing XML and HTML with lxml
To read from a file or file-like ... real file, it is more common (and also somewhat more efficient) to pass a filename: ... lxml can parse from a local file, an HTTP URL or an FTP URL....