Yes, in the package xml.etree you can find the built-in function related to XML. (also available for python2)
The one specifically you are looking for is findall.
For example:
import xml.etree.ElementTree as ET
tree = ET.fromstring(some_xml_data)
all_name_elements = tree.findall('.//name')
With:
In [1]: some_xml_data = "<help><person><name>dean</name></person></help>"
I get the following:
In [10]: tree.findall(".//name")
Out[10]: [<Element 'name' at 0x7ff921edd390>]
Answer from Dean Fenster on Stack OverflowNavigating and extracting XML in Python
finding element by tag in xml
python - Parsing XML to get all elem.tag: elem.text pairs - Code Review Stack Exchange
Extract information from an xml file
Videos
Hello,
I was able to locate a child tag using its name + his parent attributes like this :
d=soup.findall(".//*[@attr='value']/childtag[@attr2='value2'])it works fine, but whenever i remove the child tag like this :
d=soup.findall(".//*[@attr='value']/[@attr2='value2'])
i get no result.
Any idea how to manage to find the child tag without naming it? (only using the information of its attrib?)
Thanks
Since the latter approach resulted into broken mismatched pairs, looks like there are codes with no descriptions.
If you want to capture only the diagnosis codes that have descriptions (existing desc nodes), you can enforce this rule with the following XPath expression:
.//diag[name and desc]
The problem though is that xml.etree.ElementTree supports a limited set of XPath features and for this particular expression to work, you need to switch to lxml.etree. But, it will come with a performance boost, better memory usage and a richer functionality. It's worth it.
You can also simplify the way you extract codes by using findtext() and a dictionary comprehension:
from pprint import pprint
import lxml.etree as ET
data = """your XML here"""
root = ET.fromstring(data)
result = {diag.findtext("name"): diag.findtext("desc")
for diag in root.xpath(".//diag[name and desc]")}
pprint(result)
XPath support fetching parent element https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax. Let's try to find name element and get parent of this element:
>>> [x for x in tree.findall('.//name/..')]
[<Element 'diag' at 0x7f9c8cece278>,
<Element 'diag' at 0x7f9c85a84908>,
<Element 'diag' at 0x7f9c85a93f98>,
<Element 'diag' at 0x7f9c85a8f188>]
When we have parent element, we can get name and desc elements:
>>> [(x.find('name'), x.find('desc')) for x in tree.findall('.//name/..')]
[(<Element 'name' at 0x7f9c875ba7c8>, <Element 'desc' at 0x7f9c8cedfe58>),
(<Element 'name' at 0x7f9c85a84958>, <Element 'desc' at 0x7f9c85a84f98>),
(<Element 'name' at 0x7f9c85a8f048>, <Element 'desc' at 0x7f9c85a8f098>),
(<Element 'name' at 0x7f9c85a8f1d8>, <Element 'desc' at 0x7f9c85a8f228>)]
And finally:
>>> [(x.find('name').text, x.find('desc').text) for x in tree.findall('.//name/..')]
[('A00', 'Cholera'),
('A00.0', 'Cholera due to Vibrio cholerae 01, biovar cholerae'),
('A00.1', 'Cholera due to Vibrio cholerae 01, biovar eltor'),
('A00.9', 'Cholera, unspecified')]
import xml.etree.ElementTree as ET
tree = ET.parse('./all_foods.xml')
my_text = [item.text for item in tree.iter()]
This will give you list of text that you want. If you want some specific text you can use
my_tags = [item.text for item in tree.iter() if item.text == "title1"]
Since from your question it sounds like you're looking to get a specific key, you can simple use find(<key_name>).text to get the contents of the XML key with that name
import xml.etree.ElementTree as ET
tree = ET.parse('./all_foods.xml')
root = tree.getroot()
for x in root:
print(x.find("title").text)
>>>
title1
title2
title3
You could consider using xpath in lxml. Using text() enables you to find 'Panama' as the content of an element quickly. Once you have done that you can navigate to neighbouring information items for the same country.
>>> from lxml import etree
>>> tree = etree.parse('test.xml')
>>> tree.xpath('.//name/text()')
['Liechtenstein', 'Singapore', 'Panama']
>>> for item in tree.xpath('.//name/text()'):
... if item == 'Panama':
... for cousins in item.getparent().getparent().getchildren():
... cousins.text
...
'Panama'
'68'
'2011'
'13600'
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
countries = tree.findall("country")
for country in countries:
name = country.find("name")
if name.text == "Panama":
print(name.text)
Also, please note that your xml is not well formed. You have an ] instead of an > in line 19 of test.xml