So you need to understand the structure of the XML and then use the actual tags you're looking for instead of 'Data'
item = element.find('Item')
print(item.tag ,":",item.text)
value = element.find('Value')
print(value.tag ,":",value.text)
Your actual problem is that you need to change the import you use.
import xml.etree.ElementTree as ET
https://docs.python.org/2/library/xml.etree.elementtree.html
Edit: with the way that's structured, you can get a list of Data elements by saying
for data in root.findall('Data'):
item = data.find('Item')
print(item.tag ,":",item.text)
value = data.find('Value')
print(value.tag ,":",value.text)
Now, understand that if that "Data" tag is not at the root level, then you need to root.find() until you can get to it. In other words, if those "Data" tags are enclosed in some parent tags, you need to root.find("Parent Tag"), hope you get the gist of it
Edit2: Looked at my own msinfo.nfo file and this worked:
disks = root.find(".//Category[@name='Disks']")
for disk in disks:
item = disk.find('Item')
print(item.tag ,":",item.text)
value = disk.find('Value')
print(value.tag ,":",value.text)
Note: This uses XPath syntax to find the element, which is only available in ElementTree1.3 (Python 2.7 and higher). You can also brute force it by following the structure of the XML and traversing through the tree until you get to Disks. The path was System Summary->Components->Storage->Disks and under Disks were those Data elements with Item and Value as children.
Answer from drez90 on Stack OverflowPython Parse XML file for certain lines and output the line to Text widget - Stack Overflow
python - Reading line by line the data from an XML file - Stack Overflow
Reading only lines that are needed and not the ones that are not needed
Python - Parse Single Line from XML - Stack Overflow
Videos
So you need to understand the structure of the XML and then use the actual tags you're looking for instead of 'Data'
item = element.find('Item')
print(item.tag ,":",item.text)
value = element.find('Value')
print(value.tag ,":",value.text)
Your actual problem is that you need to change the import you use.
import xml.etree.ElementTree as ET
https://docs.python.org/2/library/xml.etree.elementtree.html
Edit: with the way that's structured, you can get a list of Data elements by saying
for data in root.findall('Data'):
item = data.find('Item')
print(item.tag ,":",item.text)
value = data.find('Value')
print(value.tag ,":",value.text)
Now, understand that if that "Data" tag is not at the root level, then you need to root.find() until you can get to it. In other words, if those "Data" tags are enclosed in some parent tags, you need to root.find("Parent Tag"), hope you get the gist of it
Edit2: Looked at my own msinfo.nfo file and this worked:
disks = root.find(".//Category[@name='Disks']")
for disk in disks:
item = disk.find('Item')
print(item.tag ,":",item.text)
value = disk.find('Value')
print(value.tag ,":",value.text)
Note: This uses XPath syntax to find the element, which is only available in ElementTree1.3 (Python 2.7 and higher). You can also brute force it by following the structure of the XML and traversing through the tree until you get to Disks. The path was System Summary->Components->Storage->Disks and under Disks were those Data elements with Item and Value as children.
Here is my code with your sample data, I know it could be written better but I think this solves your problem :)
you have to find the root(xml) and then iterate it's texts ! you can also use other methods like iterfind for better solutions
xml_file = "<xml><Item><![CDATA[Model]]></Item><Value><![CDATA[TOSHIB MK1652GSX SCSI Disk Device]]></Value></xml>"
from xml.etree import ElementTree
root = ElementTree.fromstring(xml_file)
start = root.itertext()
while True:
try:
print start.next()
except StopIteration:
break
Here is the output:
>>>Model
>>>TOSHIB MK1652GSX SCSI Disk Device
You want to use an XML parser like
- elementree
- lxml
- minidom
etc. for parsing any kind of XML file. Parsing XML yourself - especially line-by-line is error-prone. Especially the usage of regular expressions is broken-by-design. Don't do that.
Be smart and use an XML parser instead.
You are iterating over a string, not over the file.
If you want to iterate over the lines in a string use str.splitlines:
>>> text ='''first
... second
... '''
>>> for line in text.splitlines():
... print(line)
...
first
second
>>> for char in text:
... print(char)
...
f
i
r
s
t
s
e
c
o
n
d
Anyway I'd advice you to use an XML parser. The stdlib already provides one and there are plenty of additional libraries around.
>>> code = '''<program new-version="1.1.1.1" name="ProgramName">
... <download-url value="http://website.com/file.exe"/>
... </program>'''
With lxml:
>>> import lxml.etree
>>> lxml.etree.fromstring(code).xpath('//download-url/@value')[0]
'http://website.com/file.exe'
With the built-in xml.etree.ElementTree:
>>> import xml.etree.ElementTree
>>> doc = xml.etree.ElementTree.fromstring(code)
>>> doc.find('.//download-url').attrib['value']
'http://website.com/file.exe'
With the built-in xml.dom.minidom:
>>> import xml.dom.minidom
>>> doc = xml.dom.minidom.parseString(code)
>>> doc.getElementsByTagName('download-url')[0].getAttribute('value')
u'http://website.com/file.exe'
Which one you pick is entirely up to you. lxml needs to be installed, but is the fastest and most feature-rich library. xml.etree.ElementTree has a funky interface, and its XPath support is limited (depends on the version of the python standard library). xml.dom.minidom does not support xpath and tends to be slower, but implements the cross-plattform DOM.
import lxml
from lxml import etree
et = etree.parse("your xml file or url")
value = et.xpath('//download-url/@value')
print "".join(value)
output = 'http://website.com/file.exe'
you can also use cssselect
f = open("your xml file",'r')
values = f.readlines()
values = "".join(values)
import lxml.html
doc = lxml.html.fromstring(values)
elements = doc.cssselect('document program download-url') //csspath using firebug
elements[0].get('value')
output = 'http://website.com/file.exe'
Use ElementTree:
import xml.etree.ElementTree as ET
tree = ET.parse('Config.xml')
root = tree.getroot()
print(root.findall('.//Log'))
Output:
pawel@pawel-XPS-15-9570:~/test$ python parse_xml.py
[<Element 'Log' at 0x7fb3f2eee9f
Below:
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Automation_Config>
<Path>
<Log>.\SERVER.log</Log>
<Flag_Path>.\Flag</Flag_Path>
<files>.\PO</files>
</Path>
</Automation_Config>'''
root = ET.fromstring(xml)
for idx,log_element in enumerate(root.findall('.//Log')):
print('{}) Log value: {}'.format(idx,log_element.text))
output
0) Log value: .\SERVER.log