The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.

from lxml import etree

with open(r'path/to/xml', 'r') as xml:
    text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
    if item.tag == 'hw':
       row[0] = item.text
    elif item.tag == 'def':
       row[1] = item.text

line = ','.join(row)

with open(r'path/to/csv', 'a') as csv:
     csv.write(line + '\n')

How you build the CSV file is largely based upon preference, but I have provided a trivial example above. If there are multiple <dps-data> tags, you could extract those elements first (which can be done with the same tree.iter method shown above), and then apply the above logic to each of them.

EDIT: I should point out that this particular implementation reads the entire XML file into memory. If you are working with a single 150mb file at a time, this should not be a problem, but it's just something to be aware of.

Answer from VergeA on Stack Overflow
🌐
GitHub
github.com › waheed0332 › xml2csv
GitHub - waheed0332/xml2csv: Python scripts for processing XML documents and converting to CSV. Also works on nested xml files. · GitHub
Converts XML files into csv file, this script is capable of converting extremely nested xml files. This script utilize power of multiprocessing to convert huge data in less time. Install required libraries using following command before running script. pip install -r requirements.txt · python xml2csv.py -f ./xml-samples/1.xml -csv out.csv
Starred by 23 users
Forked by 7 users
Languages   Python
Top answer
1 of 2
1

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

2 of 2
1

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0
🌐
Like Geeks
likegeeks.com › home › python › pandas › export xml to csv using python pandas
Export XML to CSV using Python Pandas
December 16, 2023 - Learn how to convert XML to CSV using Pandas in Python, From handling simple to complex nested XML structures efficiently.
🌐
YouTube
youtube.com › watch
Convert an XML File to CSV with Python - Supports Nested XML - YouTube
In this video, I show you how to use Python and pandas to convert an XML file to CSV. Nested XML is also supported by using a stylesheet to adjust the file t...
Published   December 12, 2023
Top answer
1 of 2
2

Since you have a <sensorEvents> tag containing 3 <sensorEvents>, the first <sensorEvents> shadows the children <sensorEvents> in <sensorEvents>.

This means

    for Sensorevents in Item.findall('sensorEvents'):

Will loop only once per

<sensorEvents>
    <sensorEvents>
        <avgSped>48.55647532226298</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>39.53368357145088</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>41.41160105233052</avgSped>
        <completed>true</completed>
    </sensorEvents>
</sensorEvents>

Then

    avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
    row.append(avgSped_)

    completed_ = Sensorevents.find('sensorEvents').find('completed').text

Gets the data for the first tag only.

You should try

for Item in root.findall('item'):
    for root_Sensorevents in Item.findall('sensorEvents'):
        for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...
2 of 2
0

You could also consider using the lxml library because with it you can search by xpath expressions which often make for simpler code.

Here, the xpath expression .//sensorEvents/sensorEvents says look for sensorEvents elements anywhere in the document and then look for the sensorEvents elements immediately under these.

Once you have these it's often a simple matter to write expressions for attributes of the elements, as shown.

>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
...     inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
... 
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')
Top answer
1 of 2
2

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,
2 of 2
2

You would need to preparse all of the CLASS_DEF entries into a dictionary. These can then be looked up when processing the PROD_DEF entries:

import csv
from lxml import etree

inFile = "./newm.xml"
outFile = "./new.csv"

tree = etree.parse(inFile)
class_defs = {}

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("ImpExp"):
    name = impexp.get('Name')

    if impexp.get('Type') == "CLASS_DEF":
        for list_of_object_arrt in impexp.findall('ListOfObject_Arrt'):
            class_defs[name] = [(obj.get('Orig_Id'), obj.get('Attr_Name')) for obj in list_of_object_arrt]

with open(outFile, 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ProductId', 'Product', 'AttributeId', 'Attribute'])

    for impexp in tree.iter("ImpExp"):
        object_num = impexp.get('Object_Num')
        name = impexp.get('Name')

        if impexp.get('Type') == "PROD_DEF":
            for list_of_object_def in impexp.findall('ListOfObject_Def'):
                for obj in list_of_object_def:
                    ancestor_num = obj.get('Ancestor_Num')
                    ancestor_name = obj.get('Ancestor_Name')

            csv_output.writerow([object_num, name] + list(class_defs.get(ancestor_name, [['', '']])[0]))

This would produce new.csv containing:

ProductId,Product,AttributeId,Attribute
2008a,Laptop,6666p,LP_Portable
2987d,Mouse,7010p,O_Portable
5463g,Speaker,,

If you are using Python 3.x, use:

with open(outFile, 'w', newline='') as f_output:    
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-to-csv-in-python
Convert XML to CSV in Python - GeeksforGeeks
July 23, 2025 - We used ElementTree to parse and navigate through the XML structure. Data from each record was collected into a list of dictionaries. Finally, we used pandas to create a CSV file from that structured data. To learn about the pandas module in depth, refer to: Python Pandas Tutorial
Find elsewhere
🌐
Syntax Byte
syntaxbytetutorials.com › home › import xml into pandas and convert to csv
Import XML into Pandas and Convert to CSV - Syntax Byte
December 13, 2023 - Use pandas to convert a nested XML file to a CSV in only three lines of Python.
🌐
Medium
medium.com › @meiyee715 › converting-xml-to-csv-python-xml-etree-25fec8e72626
Converting XML to CSV: Python xml.etree | by Amy Leong | Medium
October 14, 2023 - Replace path_to_your_xml_file.xml and path_to_output.csv with your desired paths. The provided script is a basic example, and real-world XML files can vary widely in their structure. Depending on the nature of the XML, you may need to account for attributes, nested elements, and other complexities. The beauty of Python ...
🌐
Python.org
discuss.python.org › python help
Convert xml to excel/csv - Python Help - Discussions on Python.org
October 15, 2022 - Please help me in converting XML file into excel/csv. Thank you in advance.
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.
🌐
Quora
quora.com › How-do-you-convert-XML-to-CSV-in-Python
How to convert XML to CSV in Python - Quora
Answer (1 of 4): In a strict sense? You don’t. CSV is a format (if it can even be called that!) for encoding row-based data. XML is a format for encoding tree-based data. One expects all entries to follow a simple, “all of these entries have the same fields, and a value in those fields”, ...
Top answer
1 of 2
2

We can use pd.json_normalize() to flatten the dictionary created from the XML. However, since records reside under two different keys: tag_2 and tag_7, we need to loop over those particular tags to get all the records, then concatenate the dataframes.

import pandas as pd
import xmltodict

with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
    str_xml = xml_fh.read()

dict_xml = xmltodict.parse(str_xml)

df = pd.concat(
    [
        pd.json_normalize(
            dict_xml, 
            record_path=['tag_1', tag, 'date', 'data'],            # path to record list
            meta=[['tag_1', tag, 'date', '@value']])               # path to date
        .pipe(lambda x: x.rename(columns={x.columns[-1]: 'date'})) # rename date column
        .assign(tag_1='tag_1', tag_2=tag, data='data')             # add meta columns
        for tag in ('tag_2', 'tag_7')                              # loop over tags
    ]
)[['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6']]
df.to_csv('file_01.csv', index=False)

This creates the following CSV file:

tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173

Perhaps a more maintainable way is to normalize the relevant sub-dictionary under each level 2 key. Note that in the code below, the record_path and meta paths are no longer lists.

def flatten_dict(dict_xml, level_2_tags):
    df = (
        pd.concat([
            pd.json_normalize(dict_xml['tag_1'][tag]['date'], 'data', '@value')
            .assign(tag_2=tag)
            for tag in level_2_tags
        ])
        .rename(columns={'@value': 'date'})
        .assign(tag_1='tag_1', data='data')
        .get(['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6'])
    )
    return df

# test run
flatten_dict(dict_xml, ['tag_2'])           # when there is only tag_2 in level=2

flatten_dict(dict_xml, ['tag_2', 'tag_7'])  # when there are 2 tags in level=2
2 of 2
1

Given the custom format, it looks like the best option is to use a nested list comprehension:

df = pd.DataFrame([{'tag_1': k1, 'tag_2': k2, k3: d3['@value'], **d4}
                   for k1, d1 in dict_xml.items()
                   for k2, d2 in d1.items()
                   for k3, d3 in d2.items()
                   for d4 in d3['data']])

Output:

   tag_1  tag_2        date  tag_3    tag_4              tag_5   tag_6
0  tag_1  tag_2  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
1  tag_1  tag_2  06-30-2023  val_3  val_4_2            val_5_1  -0.173
2  tag_1  tag_7  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
3  tag_1  tag_7  06-30-2023  val_3  val_4_2            val_5_1  -0.173

CSV output:

# df.to_csv('file_01.csv', index=False)

tag_1,tag_2,date,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,val_3,val_4_2,val_5_1,-0.173
🌐
E-iceblue
cdn.e-iceblue.com › Tutorials › Python › Spire.XLS-for-Python › Program-Guide › Conversion › convert-xml-to-csv-in-python.html
How to Convert XML to CSV in Python: A Complete Guide
Converting XML to CSV in Python doesn’t have to be painful. With Spire.XLS for Python, you can automate much of the process, including header generation, handling attributes, and flattening nested nodes.
🌐
Stack Overflow
stackoverflow.com › questions › 64934431 › converting-nested-xml-to-csv-using-python-for-any-xml-file
Converting Nested XML to csv using python for any XML file - Stack Overflow
I'm trying to convert a nested XML file into csv format and trying to develop a code in a way that it should be able to read any XML file. So whenever the structure of XML file changes, the same py...
Top answer
1 of 1
3

Since there's always going to be at least one TaxTotal element, I would create a new csv row for each one and go back up the tree for the preceding values.

Here's an example using lxml. I added a function to make it easier to handle empty values, but any additional formatting of values I'll leave up to you.

Python 3.6

from lxml import etree
import csv


def get_value(target_tree, xpath, namespaces):
    try:
        return target_tree.xpath(xpath, namespaces=namespaces)[0].text
    except IndexError:
        return ""


tree = etree.parse("input.xml")

ns = {"cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
      "cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
      "i2": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"}

with open("output.csv", "w") as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=";", lineterminator="\n", quoting=csv.QUOTE_MINIMAL)
    # Header
    csvwriter.writerow(["ID", "/InvoiceLine/ID", "/InvoiceLine/InvoicedQuantity", "/InvoiceLine/LineExtensionAmount",
                        "/InvoiceLine/TaxTotal/TaxAmount", "/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name"])
    for tax_total in tree.xpath("//cac:TaxTotal", namespaces=ns):
        csvwriter.writerow([get_value(tax_total, "/i2:Invoice/cbc:ID", ns),
                            get_value(tax_total, "../cbc:ID", ns),
                            get_value(tax_total, "../cbc:InvoicedQuantity", ns),
                            get_value(tax_total, "../cbc:LineExtensionAmount", ns),
                            get_value(tax_total, "cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxableAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:Percent", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:Name", ns)])

Output (output.csv)

ID;/InvoiceLine/ID;/InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;35.60;142.39;35.60;StandardRated;25;63;Moms
🌐
Stack Overflow
stackoverflow.com › questions › 44760032 › import-huge-nested-xml-files-into-python-and-convert-them-to-csv
Import huge nested XML files into Python and convert them to CSV - Stack Overflow
Just extract the first few lines of your huge files into new files (and work with those to develop your solution): you can do that from python, or from the shell - e.g.