I would recommend pandasread_xml() and to_csv() function, 3-liner:

Compare the documentation: to_csv, read_xml

import pandas as pd

df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)

Output -> (CSV-file):

id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
Answer from Hermann12 on Stack Overflow
Top answer
1 of 2
7

I would recommend pandasread_xml() and to_csv() function, 3-liner:

Compare the documentation: to_csv, read_xml

import pandas as pd

df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)

Output -> (CSV-file):

id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
2 of 2
2

I recommend just using libraries because they're usually very optimised. I'll talk about that later. For now, here's a way that utilises the xml.dom.minidom module, which is a part of the Python standard library, so no additional libraries are required.

Edit: rewrote the last part using the standard CSV library instead of manually writing the file, as suggested by a comment. That makes for 2 Python built-in modules, not 1. The original code for the CSV writing will be at the end of the reply, if you're interested.

from xml.dom import minidom
from csv import DictWriter

# Step 1: Read and parse the XML file
# Write it as a string, or open the file and read it
xml_file = open('employees.xml', 'r')
xml_data = xml_file.read()

dom = minidom.parseString(xml_data)
employees = dom.getElementsByTagName('employee')

xml_file.close()

# Step 2: Extract the required information
data = []
for employee in employees:
    emp_data = {}
    for child in employee.childNodes:
        if child.nodeType == minidom.Node.ELEMENT_NODE:
            emp_data[child.tagName] = child.firstChild.data
    data.append(emp_data)

# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w', newline = '') as csv_file:
    fieldnames = ['id', 'name', 'age', 'salary', 'division']
    writer = DictWriter(csv_file, fieldnames = fieldnames)

    writer.writeheader()
    for emp_data in data:
        writer.writerow(emp_data)


Don't reinvent the wheel, just realign it.

— Anthony J. D'Angelo, I think

I recommend NOT using this code. You should really just use lxml. It's extremely simple and easy to use and can handle complex XML structures with nested elements and attributes. Let me know how everything goes!


Original CSV write code without CSV library
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w') as f:
    f.write('id,name,age,salary,division\n')
    for emp_data in data:
        f.write(f"{emp_data['id']},{emp_data['name']},{emp_data['age']},{emp_data['salary']},{emp_data['division']}\n")
🌐
Stack Overflow
stackoverflow.com › questions › 74772537 › how-to-convert-xml-file-to-csv-using-python-script
how to convert xml file to csv using python script - Stack Overflow
import xml.etree.ElementTree as ET import csv # PARSE XML xml = ET.parse("./error.xml") root = xml.getElementsByTagName() # CREATE CSV FILE csvfile = open("data.csv",'w',encoding='utf-8') csvfile_writer = csv.writer(csvfile) # ADD THE HEADER ...
Top answer
1 of 3
22

This is a namespaced XML document. Therefore you need to address the nodes using their respective namespaces.

The namespaces used in the document are defined at the top:

xmlns:tc2="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tp1="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
xmlns="http://www.topografix.com/GPX/1/1"

So the first namespace is mapped to the short form tc2, and would be used in an element like <tc2:foobar/>. The last one, which doesn't have a short form after the xmlns, is called the default namespace, and it applies to all elements in the document that don't explicitely use a namespace - so it applies to your <trkpt /> elements as well.

Therefore you would need to write root.iter('{http://www.topografix.com/GPX/1/1}trkpt') to select these elements.

In order to also get time and elevation, you can use trkpt.find() to access these elements below the trkpt node, and then element.text to retrieve those elements' text content (as opposed to attributes like lat and lon). Also, because the time and ele elements also use the default namespace you'll have to use the {namespace}element syntax again to select those nodes.

So you could use something like this:

NS = 'http://www.topografix.com/GPX/1/1'
header = ('lat', 'lon', 'ele', 'time')

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    root = lxml.etree.fromstring(x)
    for trkpt in root.iter('{%s}trkpt' % NS):
        lat = trkpt.get('lat')
        lon = trkpt.get('lon')
        ele = trkpt.find('{%s}ele' % NS).text
        time = trkpt.find('{%s}time' % NS).text

        row = lat, lon, ele, time
        writer.writerow(row)

For more information on XML namespaces, see the Namespaces section in the lxml tutorial and the Wikipedia article on XML Namespaces. Also see GPS eXchange Format for some details on the .gpx format.

2 of 3
1

Apologies for using already-made tools here, but this did the job with your data :

  1. Convert XML to JSON : http://convertjson.com/xml-to-json.htm
  2. Take that JSON and convert JSON to CSV : https://konklone.io/json/

It worked like a charm with your data.

ele,time,_lat,_lon
0.0000000,2013-12-03T21:08:56Z,45.4852855,-122.6347885
0.0000000,2013-12-03T21:09:00Z,45.4852961,-122.6347926
0.2000000,2013-12-03T21:09:01Z,45.4852982,-122.6347897

So for coding, I reckon XML > JSON > CSV may be a good approach. You many find the relevant scripts pointed to in those links.

Top answer
1 of 2
2

Use csv.DictWriter, get values from node.attrib dictionary

Your elements named TrdCapRpt have attributes, if you have such node, its attribute node.attrib holds a dictionary with key/value for each attribute.

csv.DictWriter allows writing data taken from dictionary.

First some imports (I always use lxml as it is very fast and provides extra features):

from lxml import etree
import csv

Configure file names and fields to use in each record:

xml_fname = "data.xml"
csv_fname = "data.csv"

fields = [
    "RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
    "MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
    "SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst"]

Read the XML:

xml = etree.parse(xml_fname)

Iterate over elements "TrdCapRpt", write attribute values to CSV file:

with open(csv_fname, "w") as f:

    writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
    writer.writeheader()
    for node in xml.iter("TrdCaptRpt"):
        writer.writerow(node.attrib)

If you prefer using stdlib xml.etree.ElementTree, you shall manage easily as you do now, because the node.attrib is present there too.

Reading from multiple element names

In your comments, you noted, that you want to export attributes from more element names. This is also possible. To do this, I will modify the example to use xpath (which will probably work only with lxml) and add extra column "elm_name" to track, from which element is the record created:

fields = [
    "elm_name",

    "RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
    "MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
    "SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst",

    "Typ", "Amt", "Ccy"
]

xml = etree.parse(xml_fname)

with open(csv_fname, "w") as f:

    writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
    writer.writeheader()
    for node in xml.xpath("//*[self::TrdCaptRpt or self::PosRpt or self::Amt]"):
        atts = node.attrib
        atts["elm_name"] = node.tag
        writer.writerow(node.attrib)

The modifications are:

  • fields got extra "elm_name" field and fields from other elements (feel free to remove those you are not interested at).
  • iterate over elements using xml.xpath. The XPath expression is more complex so I am not sure, if stdlib ElementTree supports that.
  • before writing the record, I add name of the element into atts dictionary to provide name of the element.

Warning: the element Amt is nested inside PosRpt and this tree structure is not possible to support in CSV. The records are written, but do not hold information about where they come from (apart from following the record for parent element).

2 of 2
0

You should first push each line with all your tags into a list.

for node in tree.iter('TrdCaptRpt'):

    .....

    my_list.push([RptID, TrdTyp, TrdSubTyp, TrdDt, BizDt, 
                  MLegRptTyp, MtchStat, MsgEvtSrc, TrdID, 
                  LastQty, LastPx, TxnTm, SettlCcy, SettlDt, 
                  PxSubTyp, VenueTyp, VenuTyp, OfstInst])

Then write each line to file :

with open('/Users/anantsangar/Desktop/output.csv', 'w') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in my_list:
    spamwriter.writerow(row)
Top answer
1 of 2
1

You probably don't need to go through ElementTree; you can feed the xml directly to pandas. If I understand you correctly, this should do it:

df = pd.read_xml(path_to_file,"//*[local-name()='MainVIP']")
df = df.iloc[:,:4]
df

Output from your xml above:

    Date    RegisteredDate  Type    TypeDescription
0   20210616    20210216    YMBA    TYPE OF ENQUIRY
2 of 2
-1

Without any external lib - the code below generates a csv file.
The idea is to collect the required elements data from MainVip and store it in list of dicts. Loop on the list and write the data into a file.

import xml.etree.ElementTree as ET

xml = ''' <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
            <Level3>
                <ResponseStatus>Success</ResponseStatus>
                <ErrorMessage/>
                <Message>20 alert(s) generated for this period</Message>
                <ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
                <Something1>1</Something1>
                <Something2/>
                <Something3/>
                <Something4/>
                <VIP>
                    <MainVIP>
                        <Date>20210616</Date>
                        <RegisteredDate>20210216</RegisteredDate>
                        <Type>YMBA</Type>
                        <TypeDescription>TYPE OF ENQUIRY</TypeDescription>
                        <BusinessName>COMPANY NAME</BusinessName>
                        <ITNumber>987654321</ITNumber>
                        <RegistrationNumber>123456789</RegistrationNumber>
                        <SubscriberNumber>55889977</SubscriberNumber>
                        <SubscriberReference/>
                        <TicketNumber>1122336655</TicketNumber>
                        <SubscriberName>COMPANY NAME 2 </SubscriberName>
                        <CompletedDate>20210615</CompletedDate>
                    </MainVIP>
                </VIP>
                <Something5/>
                <Something6/>
                <Something7/>
                <Something8/>
                <Something9/>
                <PrincipalSomething10/>
                <PrincipalSomething11/>
                <PrincipalSomething12/>
                <PrincipalSomething13/>
                <Something14/>
                <Something15/>
                <Something16/>
                <Something17/>
                <Something18/>
                <PrincipalSomething19/>
                <PrincipalSomething20/>
            </Level3>
        </Level2>
    </soap:Body>
</soap:Envelope>'''

cols = ['Date', 'RegisteredDate', 'Type',
        'TypeDescription']
rows = []
NS = '{https://xxxxxxxxxx/xxxxxxx}'
root = ET.fromstring(xml)
for vip in root.findall(f'.//{NS}MainVIP'):
    rows.append({c: vip.find(NS+c).text for c in cols})
with open('out.csv','w') as f:
    f.write(','.join(cols) + '\n')
    for row in rows:
        f.write(','.join(row[c] for c in cols) + '\n')

out.csv

Date,RegisteredDate,Type,TypeDescription
20210616,20210216,YMBA,TYPE OF ENQUIRY
Top answer
1 of 1
1

I'd do this in a very explicit way rather than trying to hack xmltodict to fit your needs.

The only downside I see with this approach is a bit of repetition with the hardcoded headers and tags names.

Also, I don't know how regular you input XML is going to be. If it's possible that some of the tags will not be present then you will need to add some error handling (because node.find will return None, then .text will cause an AttributeError).

rows = []
for abc_node in tree.findall('abc'):
    rate_node = abc_node.find('Rate')
    fee_node = abc_node.find('fee')
    row = {'id': abc_node.find('id').text,
           'uniqueid': abc_node.find('uniqueid').text,
           'Name': abc_node.find('Name').text,
           'rate_mrp': rate_node.find('mrp').text,
           'rate_discount': rate_node.find('discount').text,
           'rate_discountmonths': rate_node.find('discountmonths').text,
           'fee_type': fee_node.find('type').text,
           'fee_minimumfee': fee_node.find('minimumfee').text,
           'fee_maxfee': fee_node.find('maxfee').text}
    rows.append(row)

with open('test.csv', 'w', encoding='utf8') as f:
    headers = ['id', 'uniqueid', 'Name', 'rate_mrp', 'rate_discount', 'rate_discountmonths',
               'fee_type', 'fee_minimumfee', 'fee_maxfee']
    dict_writer = csv.DictWriter(f, fieldnames=headers, lineterminator='\n')
    dict_writer.writeheader()
    dict_writer.writerows(rows)

Output

id,uniqueid,Name,rate_mrp,rate_discount,rate_discountmonths,fee_type,fee_minimumfee,fee_maxfee
23,23_0,,6.40000,10.00%,2,off,"£1,500.75",£10K
35,35_0,,7.90000,5.00%,5,offer,£1k,"£22,000" 

If you want | as delimiter just add delimiter='|' to csv.DictWriter(f, fieldnames=headers, lineterminator='\n')

then the output is

id|uniqueid|Name|rate_mrp|rate_discount|rate_discountmonths|fee_type|fee_minimumfee|fee_maxfee
23|23_0||6.40000|10.00%|2|off|£1,500.75|£10K
35|35_0||7.90000|5.00%|5|offer|£1k|£22,000
Find elsewhere
🌐
Stack Overflow
stackoverflow.com › questions › 57243482 › convert-xml-to-csv-python
convert xml to csv python - Stack Overflow
try: import xml.etree.cElementTree as ET except ImportError: import xml.etree.ElementTree as ET import pandas as pd tree = ET.parse("file1.xml") root = tree.getroot() iter_root = root.iter() l = {} for elem in iter_root: l[str(elem.tag)] = str(elem.text) df = pd.DataFrame.from_dict(l,orient="index") df.to_csv('ABC.csv')
🌐
Delft Stack
delftstack.com › home › howto › python › xml to csv python
How to Convert XML to CSV Using Python | Delft Stack
February 2, 2024 - Finally, we use the to_csv() method of the DataFrame object to write the data to a CSV file named ‘students.csv’. The index=False parameter ensures that the index is not written into the CSV file. xmltodict is a Python library that provides ...
🌐
GitHub
github.com › waheed0332 › xml2csv
GitHub - waheed0332/xml2csv: Python scripts for processing XML documents and converting to CSV. Also works on nested xml files. · GitHub
This script utilize power of multiprocessing to convert huge data in less time. Install required libraries using following command before running script. pip install -r requirements.txt · python xml2csv.py -f ./xml-samples/1.xml -csv out.csv
Starred by 23 users
Forked by 7 users
Languages   Python
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-to-csv-in-python
Convert XML to CSV in Python - GeeksforGeeks
July 23, 2025 - Export the DataFrame to a CSV file. To download the XML data used in the examples, click here. Python Program to Convert XML to CSV.
🌐
Stack Overflow
stackoverflow.com › questions › 26081880 › how-to-convert-xml-file-of-stack-overflow-dump-to-csv-file
python 3.x - How to convert xml file of stack overflow dump to csv file - Stack Overflow
October 16, 2016 - I have written a PySpark function to parse the .xml in .csv. XmltoCsv_StackExchange is the github repo. Used it to convert 1 GB of xml within 2-3 minutes on a minimal 2-core and 2 GB RAM Spark setup.