The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.

from lxml import etree

with open(r'path/to/xml', 'r') as xml:
    text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
    if item.tag == 'hw':
       row[0] = item.text
    elif item.tag == 'def':
       row[1] = item.text

line = ','.join(row)

with open(r'path/to/csv', 'a') as csv:
     csv.write(line + '\n')

How you build the CSV file is largely based upon preference, but I have provided a trivial example above. If there are multiple <dps-data> tags, you could extract those elements first (which can be done with the same tree.iter method shown above), and then apply the above logic to each of them.

EDIT: I should point out that this particular implementation reads the entire XML file into memory. If you are working with a single 150mb file at a time, this should not be a problem, but it's just something to be aware of.

Answer from VergeA on Stack Overflow
🌐
GitHub
github.com › waheed0332 › xml2csv
GitHub - waheed0332/xml2csv: Python scripts for processing XML documents and converting to CSV. Also works on nested xml files. · GitHub
Converts XML files into csv file, this script is capable of converting extremely nested xml files. This script utilize power of multiprocessing to convert huge data in less time. Install required libraries using following command before running script. pip install -r requirements.txt · python xml2csv.py -f ./xml-samples/1.xml -csv out.csv
Starred by 23 users
Forked by 7 users
Languages   Python
Top answer
1 of 2
1

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

2 of 2
1

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0
🌐
Like Geeks
likegeeks.com › home › python › pandas › export xml to csv using python pandas
Export XML to CSV using Python Pandas
December 16, 2023 - Learn how to convert XML to CSV using Pandas in Python, From handling simple to complex nested XML structures efficiently.
Top answer
1 of 2
2

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,
2 of 2
2

You would need to preparse all of the CLASS_DEF entries into a dictionary. These can then be looked up when processing the PROD_DEF entries:

import csv
from lxml import etree

inFile = "./newm.xml"
outFile = "./new.csv"

tree = etree.parse(inFile)
class_defs = {}

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("ImpExp"):
    name = impexp.get('Name')

    if impexp.get('Type') == "CLASS_DEF":
        for list_of_object_arrt in impexp.findall('ListOfObject_Arrt'):
            class_defs[name] = [(obj.get('Orig_Id'), obj.get('Attr_Name')) for obj in list_of_object_arrt]

with open(outFile, 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ProductId', 'Product', 'AttributeId', 'Attribute'])

    for impexp in tree.iter("ImpExp"):
        object_num = impexp.get('Object_Num')
        name = impexp.get('Name')

        if impexp.get('Type') == "PROD_DEF":
            for list_of_object_def in impexp.findall('ListOfObject_Def'):
                for obj in list_of_object_def:
                    ancestor_num = obj.get('Ancestor_Num')
                    ancestor_name = obj.get('Ancestor_Name')

            csv_output.writerow([object_num, name] + list(class_defs.get(ancestor_name, [['', '']])[0]))

This would produce new.csv containing:

ProductId,Product,AttributeId,Attribute
2008a,Laptop,6666p,LP_Portable
2987d,Mouse,7010p,O_Portable
5463g,Speaker,,

If you are using Python 3.x, use:

with open(outFile, 'w', newline='') as f_output:    
Top answer
1 of 2
2

Since you have a <sensorEvents> tag containing 3 <sensorEvents>, the first <sensorEvents> shadows the children <sensorEvents> in <sensorEvents>.

This means

    for Sensorevents in Item.findall('sensorEvents'):

Will loop only once per

<sensorEvents>
    <sensorEvents>
        <avgSped>48.55647532226298</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>39.53368357145088</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>41.41160105233052</avgSped>
        <completed>true</completed>
    </sensorEvents>
</sensorEvents>

Then

    avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
    row.append(avgSped_)

    completed_ = Sensorevents.find('sensorEvents').find('completed').text

Gets the data for the first tag only.

You should try

for Item in root.findall('item'):
    for root_Sensorevents in Item.findall('sensorEvents'):
        for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...
2 of 2
0

You could also consider using the lxml library because with it you can search by xpath expressions which often make for simpler code.

Here, the xpath expression .//sensorEvents/sensorEvents says look for sensorEvents elements anywhere in the document and then look for the sensorEvents elements immediately under these.

Once you have these it's often a simple matter to write expressions for attributes of the elements, as shown.

>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
...     inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
... 
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')
🌐
Quora
quora.com › How-do-you-convert-XML-to-CSV-in-Python
How to convert XML to CSV in Python - Quora
Answer (1 of 4): In a strict sense? You don’t. CSV is a format (if it can even be called that!) for encoding row-based data. XML is a format for encoding tree-based data. One expects all entries to follow a simple, “all of these entries have the same fields, and a value in those fields”, ...
🌐
YouTube
youtube.com › watch
Convert an XML File to CSV with Python - Supports Nested XML - YouTube
In this video, I show you how to use Python and pandas to convert an XML file to CSV. Nested XML is also supported by using a stylesheet to adjust the file t...
Published   May 24, 2018
Find elsewhere
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-to-csv-in-python
Convert XML to CSV in Python - GeeksforGeeks
July 23, 2025 - We used ElementTree to parse and navigate through the XML structure. Data from each record was collected into a list of dictionaries. Finally, we used pandas to create a CSV file from that structured data. To learn about the pandas module in depth, refer to: Python Pandas Tutorial
🌐
Medium
medium.com › @meiyee715 › converting-xml-to-csv-python-xml-etree-25fec8e72626
Converting XML to CSV: Python xml.etree | by Amy Leong | Medium
October 14, 2023 - Replace path_to_your_xml_file.xml and path_to_output.csv with your desired paths. The provided script is a basic example, and real-world XML files can vary widely in their structure. Depending on the nature of the XML, you may need to account for attributes, nested elements, and other complexities. The beauty of Python ...
🌐
Syntax Byte
syntaxbytetutorials.com › home › import xml into pandas and convert to csv
Import XML into Pandas and Convert to CSV - Syntax Byte
August 21, 2024 - Use pandas to convert a nested XML file to a CSV in only three lines of Python.
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.
🌐
Python.org
discuss.python.org › python help
Convert xml to excel/csv - Python Help - Discussions on Python.org
October 15, 2022 - Please help me in converting XML file into excel/csv. Thank you in advance.
🌐
YouTube
youtube.com › watch
Converting Nested XML to CSV in Python Dynamically: A Comprehensive Guide! - YouTube
Learn how to dynamically convert nested XML data to CSV in Python, handling arrays and extra tags without breaking a sweat!---This video is based on the ques...
Published   November 13, 2024
Views   10
Top answer
1 of 2
2

We can use pd.json_normalize() to flatten the dictionary created from the XML. However, since records reside under two different keys: tag_2 and tag_7, we need to loop over those particular tags to get all the records, then concatenate the dataframes.

import pandas as pd
import xmltodict

with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
    str_xml = xml_fh.read()

dict_xml = xmltodict.parse(str_xml)

df = pd.concat(
    [
        pd.json_normalize(
            dict_xml, 
            record_path=['tag_1', tag, 'date', 'data'],            # path to record list
            meta=[['tag_1', tag, 'date', '@value']])               # path to date
        .pipe(lambda x: x.rename(columns={x.columns[-1]: 'date'})) # rename date column
        .assign(tag_1='tag_1', tag_2=tag, data='data')             # add meta columns
        for tag in ('tag_2', 'tag_7')                              # loop over tags
    ]
)[['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6']]
df.to_csv('file_01.csv', index=False)

This creates the following CSV file:

tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173

Perhaps a more maintainable way is to normalize the relevant sub-dictionary under each level 2 key. Note that in the code below, the record_path and meta paths are no longer lists.

def flatten_dict(dict_xml, level_2_tags):
    df = (
        pd.concat([
            pd.json_normalize(dict_xml['tag_1'][tag]['date'], 'data', '@value')
            .assign(tag_2=tag)
            for tag in level_2_tags
        ])
        .rename(columns={'@value': 'date'})
        .assign(tag_1='tag_1', data='data')
        .get(['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6'])
    )
    return df

# test run
flatten_dict(dict_xml, ['tag_2'])           # when there is only tag_2 in level=2

flatten_dict(dict_xml, ['tag_2', 'tag_7'])  # when there are 2 tags in level=2
2 of 2
1

Given the custom format, it looks like the best option is to use a nested list comprehension:

df = pd.DataFrame([{'tag_1': k1, 'tag_2': k2, k3: d3['@value'], **d4}
                   for k1, d1 in dict_xml.items()
                   for k2, d2 in d1.items()
                   for k3, d3 in d2.items()
                   for d4 in d3['data']])

Output:

   tag_1  tag_2        date  tag_3    tag_4              tag_5   tag_6
0  tag_1  tag_2  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
1  tag_1  tag_2  06-30-2023  val_3  val_4_2            val_5_1  -0.173
2  tag_1  tag_7  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
3  tag_1  tag_7  06-30-2023  val_3  val_4_2            val_5_1  -0.173

CSV output:

# df.to_csv('file_01.csv', index=False)

tag_1,tag_2,date,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,val_3,val_4_2,val_5_1,-0.173
🌐
Stack Overflow
stackoverflow.com › questions › 44760032 › import-huge-nested-xml-files-into-python-and-convert-them-to-csv
Import huge nested XML files into Python and convert them to CSV - Stack Overflow
Just extract the first few lines of your huge files into new files (and work with those to develop your solution): you can do that from python, or from the shell - e.g.
Top answer
1 of 1
3

Normally the xml nodes that hold a value should be the corresponding columns. As I see in your xml example "child", "child2", "childid", and so on, should be columns.

Based on the above xml I've made this piece of code that should be sufficiently generic to accommodate similar examples.

import pandas as pd
import tabulate
import xml.etree.ElementTree as Xet

def getData(root, rows, columns, rowcount, name=None):
    if name != None:
        name = "{0}{1}{2}".format(name,"|",root.tag) # we construct the column names like this so that we don't risk haveing the same column on different nodes that should repeat
                                         # for example: a node named "name" could be under group and secondgroup and they shouldn't be the same column
    else:
        name = root.tag

    for item in root:
        if len(item) == 0:
            colName = "{0}{1}{2}".format(name,"|", item.tag)
            # colName = item.tag # remove this line to get the full column name; ex: root|group|grouplist|groupzone|groupsize
            if not colName in columns:
                columns.append(colName) # save the column to a list
                rowcount.append(0) # save the row on which we add the value for this column
                rows[rowcount[columns.index(colName)]].update({colName : item.text.strip()}) # add the value to the row - this will always happen on row 0
            else:
                repeatPosition = columns.index(colName) # get the column position for the repeated item
                rowcount[repeatPosition] = rowcount[repeatPosition] + 1 # increase row count
                if len(rows) <= max(rowcount):
                    rows.append({}) # add a new row based on row count
                rows[rowcount[repeatPosition]].update({colName : item.text.strip()}) # add the value on the new row

        getData(item, rows, columns, rowcount, name) # recursive call to walk trough each list of elements


xmlParse = Xet.parse('example.xml')
root = xmlParse.getroot()

rows = [{}] # adding at least one row from the start and will add additional rows as we go along
columns = [] # holds the names of the columns
rowcount = [] # holds the rows on which we add each element value; ex: 
getData(root, rows, columns, rowcount)

df = pd.DataFrame(rows, columns=columns)
print(df)
df.to_csv('parse.csv')

The end result after running this code looks like this: csv result

And this is the plain csv:

,root|child,root|child2,root|anotherchild|childid,root|anotherchild|childname,root|group|groupid,root|group|grouplist|groupzone|groupname,root|group|grouplist|groupzone|groupsize,root|secondgroup|secondgroupid,root|secondgroup|secondgrouptitle,root|secondgroup|secondgrouplist|secondgroupzone|secondgroupsub|secondsub,root|secondgroup|secondgrouplist|secondgroupzone|secondgroupsub|secondsubid,root|secondgroup|secondgrouplist|secondgroupzone|secondgroupname,root|secondgroup|secondgrouplist|secondgroupzone|secondgroupsize,root|child3
0,child-val,child2-val2,another child 45,another child name,groupid-123,first,4,secondgroupid-42,second group title,v1,12,third,4,val3
1,,,,,,second,6,,,v2,1,fourth,6,
2,,,,,,third,8,,,v3,45,tenth,10,

Hopefully this should get you started in the right direction.

🌐
Medium
medium.com › @haniyaali1230129 › from-complex-xml-to-structured-csv-parsing-xml-in-python-0a18b26c8224
From Complex XML to Structured CSV— Parsing XML in Python | by Haniya Maqsood | Medium
February 2, 2024 - Handling large and complex XML files in text editors can be extremely challenging, especially files larger than 12GB, which often fail to open properly. I encountered this problem frequently early in my career. So I wrote a Python script to convert complex and nested XML tags into a clean CSV file, ...
🌐
e-iceblue
e-iceblue.com › Tutorials › Python › Spire.XLS-for-Python › Program-Guide › Conversion › convert-xml-to-csv-in-python.html
How to Convert XML to CSV in Python: A Complete Guide
November 20, 2019 - Hierarchical data – XML allows nesting (e.g., <reviews> inside <book>), while CSV is flat. Attributes vs. elements – Data may be stored as an attribute (isbn) or as a tag (title). Optional fields – Not all <book> elements may contain the same tags, which can lead to missing values in the CSV.