create a csv file which is Excel friendly format.
import xml.etree.ElementTree as ET
from os import listdir
xml_lst = [f for f in listdir() if f.startswith('xml')]
fields = ['RecordID','I_25Hz_1s','I_75Hz_2s'] # TODO - add rest of the fields
with open('out.csv','w') as f:
f.write(','.join(fields) + '\n')
for xml in xml_lst:
root = ET.parse(xml)
values = [root.find(f'.//{f}').text for f in fields]
f.write(','.join(values) + '\n')
output
RecordID,I_25Hz_1s,I_75Hz_2s
Madird01,56.40,0.36
London01,56.40,0.36
Answer from balderman on Stack Overflowcreate a csv file which is Excel friendly format.
import xml.etree.ElementTree as ET
from os import listdir
xml_lst = [f for f in listdir() if f.startswith('xml')]
fields = ['RecordID','I_25Hz_1s','I_75Hz_2s'] # TODO - add rest of the fields
with open('out.csv','w') as f:
f.write(','.join(fields) + '\n')
for xml in xml_lst:
root = ET.parse(xml)
values = [root.find(f'.//{f}').text for f in fields]
f.write(','.join(values) + '\n')
output
RecordID,I_25Hz_1s,I_75Hz_2s
Madird01,56.40,0.36
London01,56.40,0.36
When you need to iterate over files in folder with similar names one of the ways could be make a pattern and use glob. To make sure that returned path is file you can use isfile().
Regarding XML, I see that basically you need to write values of every terminal tag in column with name of this tag. As you have various files you can create tag-value dictionaries from each file and store them into ChainMap. After all files processed you can use DictWriter to write all data into final csv file.
This method is much more safe and flexible then use static column names. Firstly program will collect all possible tag(column) names from all files, so in case if XML doesn't have such a tag or have some extra tags it won't throw an exception and all data will be saved.
Code:
import xml.etree.ElementTree as ET
from glob import iglob
from os.path import isfile, join
from csv import DictWriter
from collections import ChainMap
xml_root = r"C:\data\Desktop\Blue\XML-files"
pattern = "xmlfile_*"
data = ChainMap()
for filename in iglob(join(xml_root, pattern)):
if isfile(filename):
tree = ET.parse(filename)
root = tree.getroot()
temp = {node.tag: node.text for node in root.iter() if not node}
data = data.new_child(temp)
with open(join(xml_root, "data.csv"), "w", newline="") as f:
writer = DictWriter(f, data)
writer.writeheader()
writer.writerows(data.maps[:-1]) # last is empty dict
Upd. If you want to use xlsx format instead of csv you have to use third-party library (e.g. openpyxl). Example of usage:
from openpyxl import Workbook
...
wb = Workbook(write_only=True)
ws = wb.create_sheet()
ws.append(list(data)) # write header
for row in data.maps[:-1]:
ws.append([row.get(key, "") for key in data])
wb.save(join(xml_root, "data.xlsx"))
How to parse XML into an excel sheet?
Python extract data from xml and save it to excel - Stack Overflow
How can I convert XML to CSV in Python without using libraries such as Etree or Xmltodict? - Stack Overflow
Converting multisheet XML to Excel with python - Stack Overflow
Videos
» pip install xml2xlsx
Bare with me, as I'm a novice with python, but basically, I am trying to take an XML file, and plop it into an existing excel workbook in a specific sheet. I know I have done this successfully before, but cannot find the file where I did, nor can I remember how I did.
When I do it manually, the process is pretty straight forward - download the XML file, open it with excel, copy and paste as text into the sheet. Just hoping someone could help me get started here. Thanks so much for your time.
To be more specific this is the layout of the XML file:
<products>
<product active="1" on_sale="0" discountable="1">
<sku>GG1234</sku>
<name><![CDATA[ Product Name Here ]]></name>
<description><![CDATA[Product Description Here ]]></description>
<keywords></keywords>
<price>8.9</price>
<stock_quantity>220</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>4.25</height>
<length>1.25</length>
<diameter>2.5</diameter>
<weight>0.53</weight>
<color></color>
<material>Material Here/material>
<barcode>0000000000</barcode>
<release_date>2010-02-19</release_date>
<images>
<image>/path/path.jpg</image>
<image>/path/path.jpg</image>
<image>/path/path.jpg</image>
<image>/path/path.jpg</image>
</images>
<categories>
<category code="518" video="0" parent="0">Category 1</category>
<category code="525" video="0" parent="528">Category 2</category>
<category code="138" video="0" parent="0">Category 3</category>
<category code="552" video="0" parent="528">Category 4</category>
</categories>
<manufacturer code="AC" video="0">Manufact</manufacturer>
<type code="CL" video="0">Product Type</type>
</product> . . . . .
<products>
What I need is for the follow values to populate the top row as the header of the excel file:
active
on_sale
disctountable
sku
name
description
keywords
price
stock_quantity
reorder_quantity
height
length
diameter
weight
color
material
barcode
release_date
image
category
manufacturer
code2
video3
type
code4
video5
And then their respective values to populate the cells going downward in the columns.
Hope that makes sense
The following should work:
import xml.etree.ElementTree as ET
import arcpy
xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.find(".//agreementid").text
arcpy.AddMessage(agreement)
The root.find() call uses an XPath expression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid. If there are multiple tags named that in your file, you can use root.findall() and iterate over the results. If, for example, there are three fields named agreementid, and you know you want the second one, then root.findall(".//agreementid")[1] should work.
MattDMo has given a sufficient answer to the problem, but I just want to remind you that python has a csv module which makes it easier to write comma separated data, which is typically then read into applications such as databases or spreadsheets.
From the docs:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
I would recommend pandasread_xml() and to_csv() function, 3-liner:
Compare the documentation: to_csv, read_xml
import pandas as pd
df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)
Output -> (CSV-file):
id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
I recommend just using libraries because they're usually very optimised. I'll talk about that later. For now, here's a way that utilises the xml.dom.minidom module, which is a part of the Python standard library, so no additional libraries are required.
Edit: rewrote the last part using the standard CSV library instead of manually writing the file, as suggested by a comment. That makes for 2 Python built-in modules, not 1. The original code for the CSV writing will be at the end of the reply, if you're interested.
from xml.dom import minidom
from csv import DictWriter
# Step 1: Read and parse the XML file
# Write it as a string, or open the file and read it
xml_file = open('employees.xml', 'r')
xml_data = xml_file.read()
dom = minidom.parseString(xml_data)
employees = dom.getElementsByTagName('employee')
xml_file.close()
# Step 2: Extract the required information
data = []
for employee in employees:
emp_data = {}
for child in employee.childNodes:
if child.nodeType == minidom.Node.ELEMENT_NODE:
emp_data[child.tagName] = child.firstChild.data
data.append(emp_data)
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w', newline = '') as csv_file:
fieldnames = ['id', 'name', 'age', 'salary', 'division']
writer = DictWriter(csv_file, fieldnames = fieldnames)
writer.writeheader()
for emp_data in data:
writer.writerow(emp_data)
Don't reinvent the wheel, just realign it.
— Anthony J. D'Angelo, I think
I recommend NOT using this code. You should really just use lxml. It's extremely simple and easy to use and can handle complex XML structures with nested elements and attributes. Let me know how everything goes!
Original CSV write code without CSV library
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w') as f:
f.write('id,name,age,salary,division\n')
for emp_data in data:
f.write(f"{emp_data['id']},{emp_data['name']},{emp_data['age']},{emp_data['salary']},{emp_data['division']}\n")
Parsing the xml file with Beautiful Soup seems like the easiest route I’ve seen so far but I wanted to know if I was missing an obvious solution.