Brave Search

Parsing a xml file from a specific web address using ElementTree in python

stackoverflow.com › questions › 32261323 › parsing-a-xml-file-from-a-specific-web-address-using-elementtree-in-python

You can use a combination of ElementTree's fromstring() method and the requests module's requests.get() to accomplish this.

https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree.

Install the requests module:

pip install requests

Use the requests.get() to get your xml file from the url as a string. Pass that into the fromstring() function.

import xml.etree.cElementTree as ET
import requests
tree = ET.fromstring(requests.get('http://synd.cricbuzz.com/j2me/1.0/livematches.xml').text)
for child in tree:
   print("%s - %s"%(child.get('srs'),child.get('mchDesc')))

Results:

None - None
India tour of Sri Lanka, 2015 - Cricbuzz Cup - SL vs IND
Australia tour of Ireland, 2015 - IRE vs AUS
New Zealand tour of South Africa, 2015 - RSA vs NZ
Royal London One-Day Cup, 2015 - SUR vs KENT
Royal London One-Day Cup, 2015 - ESS vs YORKS

Answer from Joe Young on Stack Overflow

Python

docs.python.org › 3 › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API

January 29, 2026 - Source code: Lib/xml/etree/ElementTree.py The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Tutorial: This is a short tutorial for using xml....

Stack Overflow

stackoverflow.com › questions › 32261323 › parsing-a-xml-file-from-a-specific-web-address-using-elementtree-in-python

Parsing a xml file from a specific web address using ElementTree in python - Stack Overflow

Top answer

1 of 1

You can use a combination of ElementTree's fromstring() method and the requests module's requests.get() to accomplish this.

https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree.

Install the requests module:

pip install requests

Use the requests.get() to get your xml file from the url as a string. Pass that into the fromstring() function.

import xml.etree.cElementTree as ET
import requests
tree = ET.fromstring(requests.get('http://synd.cricbuzz.com/j2me/1.0/livematches.xml').text)
for child in tree:
   print("%s - %s"%(child.get('srs'),child.get('mchDesc')))

Results:

None - None
India tour of Sri Lanka, 2015 - Cricbuzz Cup - SL vs IND
Australia tour of Ireland, 2015 - IRE vs AUS
New Zealand tour of South Africa, 2015 - RSA vs NZ
Royal London One-Day Cup, 2015 - SUR vs KENT
Royal London One-Day Cup, 2015 - ESS vs YORKS

Stack Overflow

stackoverflow.com › questions › 21179272 › parsing-a-url-xml-with-the-elementtree-xml-api

python - Parsing a URL XML with the ElementTree XML API - Stack Overflow

Top answer

1 of 1

You can use urllib2 to download and parse the file in the same way. For e.g. the first few lines will be changed to:

import xml.etree.cElementTree as ET
import urllib2

for i in range(3):
    tree = ET.ElementTree(file=urllib2.urlopen('http://www.trion%i.com:6060/stat.xml' % i ))


    root = tree.getroot()
    root.tag, root.attrib

    # Rest of your code goes here....

Stack Overflow

stackoverflow.com › questions › 53212932 › trying-to-parse-xml-directly-from-a-url

python - Trying to parse XML directly from a URL - Stack Overflow

Top answer

1 of 1

Inside the response element there's a row element, so your for loop should be in root[0] instead of root

Here's an example from your snippet, hope it helps you understand the issue

import xml.etree.ElementTree as ET
tree = ET.parse('rows.xml')
root = tree.getroot()

for _id in root[0].findall('row'):
    rank = _id.find('ethcty').text
    name = _id.find('cnt').text
    print(name, rank)

Also, findall should be the name of the node you want

As for loading directly from the url you should use the urllib as follows:

from urllib.request import urlopen
import xml.etree.ElementTree as ET

with urlopen('https://data.cityofnewyork.us/api/views/25th-nujf/rows.xml') as f:
    tree = ET.parse(f)
    root = tree.getroot()

    for _id in root[0].findall('row'):
        rank = _id.find('ethcty').text
        name = _id.find('cnt').text
        print(name, rank)

I edited the latter code because I forgot about the loading from the URL part of your question, i'm sorry about that

lxml

lxml.de › tutorial.html

The lxml.etree Tutorial

Serialisation commonly uses the tostring() function that returns a string, or the ElementTree.write() method that writes to a file, a file-like object, or a URL (via FTP PUT or HTTP POST).

Stack Overflow

stackoverflow.com › questions › 61667873 › python-parsing-xml-data-with-elementtree

Python - Parsing XML data with ElementTree - Stack Overflow

Below uses the built-in urllib moduel to parse XML from URL: Copyfrom urllib.request import urlopen import xml.etree.ElementTree as ET def vatbook_parse(url): with urlopen(url) as f: tree = ET.parse(f) root = tree.getroot() # CONDITIONALLY SET SEARCH PATH path = './/atcs/booking' if tree.find('atc') is None else './/atc' for atcs in root.iterfind(path): callsign = atcs.find('callsign') name = atcs.find('name') time_start = atcs.find('time_start') time_end = atcs.find('time_end') if callsign is not None: print(f"{name.text} booked {callsign.text} from {time_start.text} to {time_end.text}") First URL ·

Stack Overflow

stackoverflow.com › questions › 647071 › python-xml-elementtree-from-a-string-source

Python xml ElementTree from a string source? - Stack Overflow

Top answer

1 of 4

361

You can parse the text as a string, which creates an Element, and create an ElementTree using that Element.

import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(xmlstring))

I just came across this issue and the documentation, while complete, is not very straightforward on the difference in usage between the parse() and fromstring() methods.

2 of 4

117

If you're using xml.etree.ElementTree.parse to parse from a file, then you can use xml.etree.ElementTree.fromstring to get the root Element of the document. Often you don't actually need an ElementTree.

See xml.etree.ElementTree

Stack Overflow

stackoverflow.com › questions › 72519781 › how-to-parse-xml-with-xml-etree-elementtree

python - How to parse XML with xml.Etree.ElementTree? - Stack Overflow

Copyimport xml.etree.ElementTree as ET # efetch function is from a module which queries pubmed API. Content of response of the API is a XML you can find below in the second code snippet.

Python Forum

python-forum.io › thread-14239.html

XML parsing from URL

Hello I started my trek into Python a few days ago. I am receiving the following error: Quote:Please enter an XML URL to parse: http://py4e-data.dr-chuck.net/comments_42.xml Traceback (most recent call last): File '/home/lamidotijjo/Documents/Pyth...

Find elsewhere

Google Bing Mojeek

Stack Exchange

gis.stackexchange.com › questions › 21319 › parse-xml-files-in-python-elementtree

Parse XML files in Python (ElementTree) - Geographic Information Systems Stack Exchange

Top answer

1 of 4

Before I try to answer, a tip. Your exception handler covers up the nature of the problem. Just let the original exception rise up and you'll have more information to share with people who are interested in helping you.

I like to use feedparser to parse Atom feeds. It does indeed give you dict-like objects. I submitted a patch to feedparser 4.1 to parse the GeoRSS elements into GeoJSON style dicts. See https://code.google.com/p/feedparser/issues/detail?id=62 and blog post at http://sgillies.net/blog/566/georss-patch-for-universal-feedparser/. You'd use it like this:

>>> import feedparser
>>> feed = feedparser.parse("http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")
>>> feed.entries[0]['where']
{'type': 'Point', 'coordinates': (-122.8282, 38.844700000000003)}

My patched version of 4.1 is in my Dropbox and you can get it using pip.

$ pip install http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz

Or just download and install with "python setup.py install".

2 of 4

It's more comfortable to use lxml for XML processing. Here is an example that fetches the feed and prints earthquake titles and coordinates:

import lxml.etree

feed_url = 'http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml'
ns = {
    'atom': 'http://www.w3.org/2005/Atom',
    'georss': 'http://www.georss.org/georss',
}

def main():
    doc = lxml.etree.parse(feed_url)
    for entry in doc.xpath('//atom:entry', namespaces=ns):
        [title] = entry.xpath('./atom:title', namespaces=ns)
        [point] = entry.xpath('./georss:point', namespaces=ns)
        print point.text, title.text

if __name__ == '__main__':
    main()

Python Module of the Week

pymotw.com › 3 › xml.etree.ElementTree › parse.html

Parsing an XML Document — PyMOTW 3

August 6, 2016 - Parsing an entire document with parse() returns an ElementTree instance. The tree knows about all of the data in the input document, and the nodes of the tree can be searched or manipulated in place.

Read the Docs

stackless.readthedocs.io › en › 3.7-slp › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API — Stackless-Python 3.7.9 documentation

August 8, 2021 - This default loader reads an included resource from disk. href is a URL. parse is for parse mode either “xml” or “text”. encoding is an optional text encoding. If not given, encoding is utf-8. Returns the expanded resource. If the parse mode is "xml", this is an ElementTree instance.

Stack Overflow

stackoverflow.com › questions › 47280129 › parsing-xml-using-python-elementtree

Parsing XML using Python ElementTree - Stack Overflow

Top answer

1 of 3

From ElementTree docs:

We can import this data by reading from a file:

import xml.etree.ElementTree as ET

tree = ET.parse('country_data.xml')
root = tree.getroot()

Or directly from a string:

root = ET.fromstring(country_data_as_string)

and later in the same page, 20.5.1.4. Finding interesting elements:

for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)

Which translate to:

import xml.etree.ElementTree as ET

root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
    <F LV="0">The quick</F>
    <F LV="1">brown</F>
    <F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
    <F LV="0">The lazy</F>
    <F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
    print (h.attrib["D"])
for f in root.iter("F"):
    print (f.attrib, f.text)

output:

14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox

2 of 3

You did not specifiy what exactly you whant to use so i recommend lxml for python. For getting the values you whant you have more possibiltys:

With a loop:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
   text.append(element.get('D',None))
     for child in element:
       for grandchild in child:
         text.append(grandchild.text)
print(text)

Output: ['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']

With xpath:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot() 
D = root.xpath("./H")
F = root.xpath(".//F")

for each in D:
  print(each.get('D',None))

for each in F:
  print(each.text)

Output: 14/11/2017 14/11/2017 The quick brown fox The lazy fox

Both have there own advantages but give you a good starting point. I recommend the xpath since it gives you more freedom when values are missing.

Stack Overflow

stackoverflow.com › questions › 61410059 › parsing-xml-with-requests-and-elementtree

Parsing XML with requests and ElementTree - Stack Overflow

Top answer

1 of 1

Your problem starts with naming things properly:

xml = requests.Session().get(url.xml)

requests.get() does not return XML. It returns a response.

resp = requests.Session().get(url.xml)

And this response might contain text (i.e. a string):

print(resp.text)

And this string might be XML, which ElementTree can convert into a tree:

tree = ET.fromstring(resp.text)

which we then can use to obtain information:

tree.find('entry')

which returns None, because in this case the XML is in a namespace (xmlns="..."), and that namespace is http://uniprot.org/uniprot. We must mention that when we search for elements:

tree.find('{http://uniprot.org/uniprot}entry')

which is unwieldy to write. So we make an abbreviation:

xml_ns = {
    'up': 'http://uniprot.org/uniprot'
}

and use that instead:

tree.find('up:entry', xml_ns)

which now prints

<Element '{http://uniprot.org/uniprot}entry' at 0x03C0AD80>

Using all this, we get:

import requests
import xml.etree.ElementTree as ET

xml_url = 'https://www.uniprot.org/uniprot/P0AES4.xml'
xml_ns = {
    'up': 'http://uniprot.org/uniprot'
}

resp = requests.get(xml_url)
tree = ET.fromstring(resp.text)

def get_text(node):
    return node.text if node is not None else None

for entry in tree.findall('./up:entry', xml_ns):
    data = {
        'accession': get_text( entry.find('./up:accession', xml_ns) )
        # find and add more items
    }

    print(data)

which prints

{'accession': 'P0AES4'}

Stack Overflow

stackoverflow.com › questions › 14853243 › parsing-xml-with-namespace-in-python-via-elementtree

Parsing XML with namespace in Python via 'ElementTree' - Stack Overflow

Top answer

1 of 8

272

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:

namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed

root.findall('owl:Class', namespaces)

Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:

root.findall('{http://www.w3.org/2002/07/owl#}Class')

Also see the Parsing XML with Namespaces section of the ElementTree documentation.

As of Python 3.8, the ElementTree library also understands the {*} namespace wildcard, so root.findall('{*}Class') would also work (but don't do that if your document can have multiple namespaces that define the Class element).

If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

2 of 8

Here's how to do this with lxml without having to hard-code the namespaces or scan the text for them (as Martijn Pieters mentions):

from lxml import etree
tree = etree.parse("filename")
root = tree.getroot()
root.findall('owl:Class', root.nsmap)

UPDATE:

5 years later I'm still running into variations of this issue. lxml helps as I showed above, but not in every case. The commenters may have a valid point regarding this technique when it comes merging documents, but I think most people are having difficulty simply searching documents.

Here's another case and how I handled it:

<?xml version="1.0" ?><Tag1 xmlns="http://www.mynamespace.com/prefix">
<Tag2>content</Tag2></Tag1>

xmlns without a prefix means that unprefixed tags get this default namespace. This means when you search for Tag2, you need to include the namespace to find it. However, lxml creates an nsmap entry with None as the key, and I couldn't find a way to search for it. So, I created a new namespace dictionary like this

namespaces = {}
# response uses a default namespace, and tags don't mention it
# create a new ns map using an identifier of our choice
for k,v in root.nsmap.iteritems():
    if not k:
        namespaces['myprefix'] = v
e = root.find('myprefix:Tag2', namespaces)

Stack Overflow

stackoverflow.com › questions › 4722794 › parsing-an-xml-file-using-element-tree

python - Parsing an XML file using Element Tree - Stack Overflow

Top answer

1 of 2

ElementTree can be tricky when namespaces are involved. The element you are looking for are named <gml:lowerCorner> and <gml:upperCorner>. Searching higher in the XML data, gml is defined as an XML namespace: xmlns:gml="http://www.opengis.net/gml". The way to find a subelement of the XML tree is as follows:

from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
print tree.find('//{http://www.opengis.net/gml}lowerCorner').text
print tree.find('//{http://www.opengis.net/gml}upperCorner').text

Output

137796 483752
138178 484222

Explanation

Using ElementTree's XPath support, // selects all subelements on all levels of the tree. ElementTree uses {url}tag notation for a tag in a specific namespace. gml's URL is http://www.opengis.net/gml. .text retrieves the data in the element.

Note that // is a shortcut to finding a nested node. The full path of upperCorner in ElementTree's syntax is actually:

{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}Pngformaat/{http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo}OmsluitendeRechthoek/{http://www.opengis.net/gml}Envelope/{http://www.opengis.net/gml}upperCorner

2 of 2

Using ElementTree is very simple, basically you create an object parsed from a file, find elements by name or path, and get their text or attribute.

In your case it's a bit more complicated because you have namespaces in your file, so we have to transform the path from the form ns:tag to the form {uri}tag. This the aim of the transform_path function

NS_MAP = {
    'http://www.kadaster.nl/schemas/klic/20080722/leveringsinfo' : 'lev',
    'http://www.opengis.net/gml' : 'gml',
}
INV_NS_MAP = {v:k for k, v in NS_MAP.items()} #inverse ns_map 
#for python2: INV_NS_MAP = dict((v,k) for k, v in NS_MAP.iteritems())

#ElementTree expect tags in form {uri}tag, but it would be a pain to have complete uri for eache tag
def transform_path (path):
    res = ''
    tags = path.split('/')
    for tag in tags:
      ns, tag = tag.split(':')
      res += "{"+INV_NS_MAP[ns]+"}"+tag+'/'
    return res

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
doc = tree.getroot()

lowerCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:lowerCorner"))
upperCorner = doc.find(transform_path("lev:Pngformaat/lev:OmsluitendeRechthoek/gml:Envelope/gml:upperCorner"))
print (lowerCorner.text)         # Print coordinates
print (upperCorner.text)         # Print coordinates

#for python2: print elem.text

Running the script with you file will give the following output:

137796 483752
138178 484222

DataCamp

datacamp.com › tutorial › python-xml-elementtree

Python XML Tutorial: Element Tree Parse & Read | DataCamp

December 10, 2024 - Parse and read XML data with Element Tree Python package. Learn how to use xml.etree.elementtree and explore your data through XML today!

Python Module of the Week

pymotw.com › 2 › xml › etree › ElementTree › parse.html

Parsing XML Documents - Python Module of the Week

March 7, 2010 - The items returned by findall() and iter() are Element objects, each representing a node in the XML parse tree. Each Element has attributes for accessing data pulled out of the XML. This can be illustrated with a somewhat more contrived example input file, data.xml: The “attributes” of a node are available in the attrib property, which acts like a dictionary. from xml.etree import ElementTree with open('data.xml', 'rt') as f: tree = ElementTree.parse(f) node = tree.find('./with_attributes') print node.tag for name, value in sorted(node.attrib.items()): print ' %-4s = "%s"' % (name, value)

Stack Overflow

stackoverflow.com › questions › 60804273 › python-xml-etree-elementtree-parse-force-encoding

Python xml.etree.ElementTree parse force encoding - Stack Overflow

Top answer

1 of 1

If you are sure of the encoding, you can use open() to read the file into a string, and then use ElementTree.fromstring() to convert that string into an XML document.

with open("input.xml", encoding="Windows-1252") as fp:
    xml_string = fp.read()
    tree = ElementTree.fromstring(xml_string)

This will ignore the XML declaration, since the file is already decoded, albeit manually. For normal/compliant XML documents, this method is not recommended and you should use ElementTree.parse('filename') instead.

lxml

lxml.de › parsing.html

Parsing XML and HTML with lxml

To read from a file or file-like ... real file, it is more common (and also somewhat more efficient) to pass a filename: ... lxml can parse from a local file, an HTTP URL or an FTP URL....