Use the csv module not lxml module to write rows to csv file. But still use lxml to parse and extract content from xml file:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('registerreads_EE.xml')
root = tree.getroot()[3]
with open('registerreads_EE.csv', 'w', newline='') as r:
writer = csv.writer(r)
writer.writerow(['read', 'date', 'entityid']) # WRITING HEADERS
for channel in tree.iter('Channel'):
for exportrequest in channel.iter('ExportRequest'):
entityid = exportrequest.attrib.get('EntityID')
for meterread in channel.iter('Reading'):
read = meterread.attrib.get('Value')
date = meterread.attrib.get('ReadingTime')
# WRITE EACH ROW ITERATIVELY
writer.writerow([read[:-2],date[:10],entityid])
Answer from Parfait on Stack OverflowVideos
How to Convert XML to CSV via Python?
What is CSV Format?
How do I convert an XML file to Excel?
Minor stuff:
- Remove the last
import-etreeis not used anywhere. - Merge the two first
imports
Possibly speed-improving stuff:
- Avoid converting the
csv.readeroutput beforereturning unless absolutely necessary. - Skip
indentunless the output must be readable by a human with a non-formatting editor. - If you need to indent the output, existing solutions are probably very efficient.
- Use
reader.next()to skip the header line ingenerate_xml, then you don't need to keep checking the value ofi.
Don't use something like for elem in elem at some point with larger for loops you will miss that elem is different variable before and in/after the for loop:
for subelem in elem:
indent(subelem, level+1)
if not subelem.tail or not elem.tail.strip():
subelem.tail = i
Since indent(subelem...) already sets the tail, you probably do not need to do that again.
The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.
from lxml import etree
with open(r'path/to/xml', 'r') as xml:
text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
if item.tag == 'hw':
row[0] = item.text
elif item.tag == 'def':
row[1] = item.text
line = ','.join(row)
with open(r'path/to/csv', 'a') as csv:
csv.write(line + '\n')
How you build the CSV file is largely based upon preference, but I have provided a trivial example above. If there are multiple <dps-data> tags, you could extract those elements first (which can be done with the same tree.iter method shown above), and then apply the above logic to each of them.
EDIT: I should point out that this particular implementation reads the entire XML file into memory. If you are working with a single 150mb file at a time, this should not be a problem, but it's just something to be aware of.
How about this:
from xml.dom import minidom
xmldoc = minidom.parse('your.xml')
hw_lst = xmldoc.getElementsByTagName('hw')
defu_lst = xmldoc.getElementsByTagName('def')
with open('your.csv', 'a') as out_file:
for i in range(len(hw_lst)):
out_file.write('{0}, {1}\n'.format(hw_lst[i].firstChild.data, defu_lst[i].firstChild.data))