I would recommend pandasread_xml() and to_csv() function, 3-liner:
Compare the documentation: to_csv, read_xml
import pandas as pd
df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)
Output -> (CSV-file):
id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
Answer from Hermann12 on Stack OverflowI would recommend pandasread_xml() and to_csv() function, 3-liner:
Compare the documentation: to_csv, read_xml
import pandas as pd
df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)
Output -> (CSV-file):
id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
I recommend just using libraries because they're usually very optimised. I'll talk about that later. For now, here's a way that utilises the xml.dom.minidom module, which is a part of the Python standard library, so no additional libraries are required.
Edit: rewrote the last part using the standard CSV library instead of manually writing the file, as suggested by a comment. That makes for 2 Python built-in modules, not 1. The original code for the CSV writing will be at the end of the reply, if you're interested.
from xml.dom import minidom
from csv import DictWriter
# Step 1: Read and parse the XML file
# Write it as a string, or open the file and read it
xml_file = open('employees.xml', 'r')
xml_data = xml_file.read()
dom = minidom.parseString(xml_data)
employees = dom.getElementsByTagName('employee')
xml_file.close()
# Step 2: Extract the required information
data = []
for employee in employees:
emp_data = {}
for child in employee.childNodes:
if child.nodeType == minidom.Node.ELEMENT_NODE:
emp_data[child.tagName] = child.firstChild.data
data.append(emp_data)
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w', newline = '') as csv_file:
fieldnames = ['id', 'name', 'age', 'salary', 'division']
writer = DictWriter(csv_file, fieldnames = fieldnames)
writer.writeheader()
for emp_data in data:
writer.writerow(emp_data)
Don't reinvent the wheel, just realign it.
— Anthony J. D'Angelo, I think
I recommend NOT using this code. You should really just use lxml. It's extremely simple and easy to use and can handle complex XML structures with nested elements and attributes. Let me know how everything goes!
Original CSV write code without CSV library
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w') as f:
f.write('id,name,age,salary,division\n')
for emp_data in data:
f.write(f"{emp_data['id']},{emp_data['name']},{emp_data['age']},{emp_data['salary']},{emp_data['division']}\n")
pandas - XML to CSV Python - Stack Overflow
Simple CSV to XML Conversion - Python - Stack Overflow
Converting CSV file to XML
Read xml column inside csv file with Python
Videos
Using pandas and BeautifulSoup you can achieve your expected output easily:
#Code:
import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
content = f.read()
soup = b(content, "lxml")
pkgeid = [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time = [ values.text for values in soup.findAll("time")]
oper = [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)]
df = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)
#output in `sample.csv` file will be as follows:
2015-09-16T04:13:20Z,Create_Product,10,BBCWRL
2015-09-16T04:13:20Z,Create_Product,18,CNNINT
2018-04-01T03:30:28Z,Deactivate_Dhct,,
Using Pandas, parsing all xml fields.
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file.xml")
root = tree.getroot()
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')
A possible solution is to first load the csv into Pandas and then convert it row by row into XML, as so:
import pandas as pd
df = pd.read_csv('untitled.txt', sep='|')
With the sample data (assuming separator and so on) loaded as:
Title Type Format Year Rating Stars \
0 Enemy Behind War,Thriller DVD 2003 PG 10
1 Transformers Anime,Science Fiction DVD 1989 R 9
Description
0 Talk about...
1 A Schientific fiction
And then converting to xml with a custom function:
def convert_row(row):
return """<movietitle="%s">
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie>""" % (
row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)
print '\n'.join(df.apply(convert_row, axis=1))
This way you get a string containing the xml:
<movietitle="Enemy Behind">
<type>War,Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about...</description>
</movie>
<movietitle="Transformers">
<type>Anime,Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>9</stars>
<description>A Schientific fiction</description>
</movie>
that you can dump in to a file or whatever.
Inspired by this great answer.
Edit: Using the loading method you posted (or a version that actually loads the data to a variable):
import csv
f = open('movies2.csv')
csv_f = csv.reader(f)
data = []
for row in csv_f:
data.append(row)
f.close()
print data[1:]
We get:
[['Enemy Behind', 'War', 'Thriller', 'DVD', '2003', 'PG', '10', 'Talk about...'], ['Transformers', 'Anime', 'Science Fiction', 'DVD', '1989', 'R', '9', 'A Schientific fiction']]
And we can convert to XML with minor modifications:
def convert_row(row):
return """<movietitle="%s">
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie>""" % (row[0], row[1], row[2], row[3], row[4], row[5], row[6])
print '\n'.join([convert_row(row) for row in data[1:]])
Getting identical results:
<movietitle="Enemy Behind">
<type>War</type>
<format>Thriller</format>
<year>DVD</year>
<rating>2003</rating>
<stars>PG</stars>
<description>10</description>
</movie>
<movietitle="Transformers">
<type>Anime</type>
<format>Science Fiction</format>
<year>DVD</year>
<rating>1989</rating>
<stars>R</stars>
<description>9</description>
</movie>
I tried to generalize robertoia's function convert_row for any header instead of writing it by hand.
import csv
import pandas as pd
f = open('movies2.csv')
csv_f = csv.reader(f)
data = []
for row in csv_f:
data.append(row)
f.close()
df = pd.read_csv('movies2.csv')
header= list(df.columns)
def convert_row(row):
str_row = """<%s>%s</%s> \n"""*(len(header)-1)
str_row = """<%s>%s""" +"\n"+ str_row + """</%s>"""
var_values = [list_of_elments[k] for k in range(1,len(header)) for list_of_elments in [header,row,header]]
var_values = [header[0],row[0]]+var_values+[header[0]]
var_values =tuple(var_values)
return str_row % var_values
text ="""<collection shelf="New Arrivals">"""+"\n"+'\n'.join([convert_row(row) for row in data[1:]])+"\n" +"</collection >"
print(text)
with open('output.xml', 'w') as myfile:
myfile.write(text)
Of course with pandas now, it is simpler to just use
to_xml() :
df= pd.read_csv('movies2.csv')
with open('outputf.xml', 'w') as myfile:
myfile.write(df.to_xml())
I don't know if this is the right place to post this, but I'm looking for someone who can create a process to convert a daily csv file to xml that can be imported into our billing software.
I'm just looking for a way to find someone to pay to get this set up for me (the company I work for). I've tried searching and have come up empty, so am reaching out to reddit where I get all of my questions answered!
To sum up, I'm looking for where I would search to find someone to write a program / process for me to use to convert a csv file into xml to be able to be imported into a billing software program.
Thanks!