convert nested xml to csv python

stackoverflow.com › questions › 39576683 › convert-deeply-nested-xml-to-csv-in-python

Convert Deeply Nested XML to CSV in Python - Stack Overflow

github.com › waheed0332 › xml2csv

1 of 3

The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.

from lxml import etree

with open(r'path/to/xml', 'r') as xml:
    text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
    if item.tag == 'hw':
       row[0] = item.text
    elif item.tag == 'def':
       row[1] = item.text

line = ','.join(row)

with open(r'path/to/csv', 'a') as csv:
     csv.write(line + '\n')

2 of 3

How about this:

from xml.dom import minidom

xmldoc = minidom.parse('your.xml')
hw_lst = xmldoc.getElementsByTagName('hw')
defu_lst = xmldoc.getElementsByTagName('def')

with open('your.csv', 'a') as out_file:
    for i in range(len(hw_lst)):
        out_file.write('{0}, {1}\n'.format(hw_lst[i].firstChild.data, defu_lst[i].firstChild.data))

GitHub

GitHub - waheed0332/xml2csv: Python scripts for processing XML documents and converting to CSV. Also works on nested xml files. · GitHub

Converts XML files into csv file, this script is capable of converting extremely nested xml files. This script utilize power of multiprocessing to convert huge data in less time. Install required libraries using following command before running script. pip install -r requirements.txt · python xml2csv.py -f ./xml-samples/1.xml -csv out.csv

Starred by 23 users

Forked by 7 users

Languages Python

Videos

12:42

Convert XML to CSV in Python | Python Tutorial - YouTube

Convert XML to CSV in Python | Full Source Code | Complete Tutorial ...

October 14, 2021

15:02

Create DataFrame from Nested XML | Spark DataFrame Practical | ...

XML formatter: XML to CSV Python - YouTube

stackoverflow.com › questions › 74194876 › how-can-we-convert-a-nested-xml-to-csv-in-python-dynamically-nested-xml-may-con

How can we convert a nested XML to CSV in Python Dynamically, Nested XML may contain array of values as well? - Stack Overflow

likegeeks.com › home › python › pandas › export xml to csv using python pandas

1 of 2

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

2 of 2

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0

Like Geeks

Export XML to CSV using Python Pandas

December 16, 2023 - You can use the xml.etree.ElementTree module to parse the XML file. It iterates over each node, extracts the relevant data, and stores it in a dictionary. Finally, the dictionary is converted into a DataFrame and exported to a CSV file.

Convert an XML File to CSV with Python - Supports Nested XML - YouTube

youtube.com › watch

09:37

In this video, I show you how to use Python and pandas to convert an XML file to CSV. Nested XML is also supported by using a stylesheet to adjust the file t...

Published November 13, 2024

Quora

quora.com › How-do-you-convert-XML-to-CSV-in-Python

How to convert XML to CSV in Python - Quora

Answer (1 of 4): In a strict sense? You don’t. CSV is a format (if it can even be called that!) for encoding row-based data. XML is a format for encoding tree-based data. One expects all entries to follow a simple, “all of these entries have the same fields, and a value in those fields”, ...

Python Pool

pythonpool.com › home › blog › xml to csv conversion using python

XML to CSV Conversion Using Python - Python Pool

June 14, 2021 - In this article, we learned about the conversion of Python XML and CSV format. We saw examples for each and further learned a python implementation of converting an XML file to a CSV file in Python.

stackoverflow.com › questions › 70599829 › transform-nested-xml

python - Transform Nested XML - Stack Overflow

medium.com › @meiyee715 › converting-xml-to-csv-python-xml-etree-25fec8e72626

1 of 1

For nested XML you can use iterparse() function to iterate over all elements in the XML. You would then need to have logic to handle the elements depending on what tag it's looking at to add to a dictionary object to export as a row.

for _, elem in ET.iterparse('file.xml'):
    if len(elem) == 0:
        print(f'{elem.tag} {elem.attrib} text={elem.text}')
    else:
        print(f'{elem.tag} {elem.attrib}')

To create a row in a CSV file from the element text then can do something like this. If, for example, the "test2" marks the beginning of a new record then that can be used to write the record to a new row and clear the dictionary for the next record.

If want to output all or some attributes then need to add a few lines of code for that. If attribute names have the same name as element name or multiple elements have same attribute (e.g. ID) then need to address that in your code.

import xml.etree.ElementTree as ET
import re
import csv

with open("out.csv", "w", newline='') as fout:
    header = ['test3','test4','test7','test9','test13','test14','test17','test18','test19','Comments']
    csvout = csv.DictWriter(fout, fieldnames=header)
    csvout.writeheader()
    row = {}
    for _, elem in ET.iterparse('test.xml'):
        # strip the namespace from the element tag name; e.g. {Test.xsd}test14 > test14
        tag = re.sub("^{.*?}", "", elem.tag)
        if tag == 'test2':
            if len(row) != 0:
                print(row)
                csvout.writerow(row)
                row = {}
        if len(elem) == 0:
            row[tag] = elem.text

Output:

{'test3': 'Something Something', 'test4': 'AA', 'Comments': 'BB', 'test7': '123 street', 'test9': 'test work', 'test14': '746745636', 'test13': 'Some date'}
{'test3': 'None test', 'test4': 'Someone', 'Comments': 'Some comment', 'test7': '5634643643', 'test17': 'Some Info', 'test19': 'Somewhere', 'test18': '63243333', 'test14': '456436436346', 'test13': '54234532452345'}

CSV Output:

test3,test4,test7,test9,test13,test14,test17,test18,test19,Comments
Something Something,AA,123 street,test work,Some date,746745636,,,,BB
None test,Someone,5634643643,,54234532452345,456436436346,Some Info,63243333,Somewhere,Some comment

Update:

If want to handle duplicate tags and create a list of values then try something like this:

if len(elem) == 0:
    text = elem.text
    old = row.get(tag)
    if old is None:
        # first occurrence
        row[tag] = text
    elif isinstance(old, str):
        # second occurrence > create list
        row[tag] = [old, text]
    else:
        old.append(text)

Find elsewhere

Google Bing Mojeek

Medium

Converting XML to CSV: Python xml.etree | by Amy Leong | Medium

October 14, 2023 - Replace path_to_your_xml_file.xml and path_to_output.csv with your desired paths. The provided script is a basic example, and real-world XML files can vary widely in their structure. Depending on the nature of the XML, you may need to account for attributes, nested elements, and other complexities. The beauty of Python ...

stackoverflow.com › questions › 48941023 › create-a-dataframe-from-nested-xml-and-generate-a-csv

python - Create a dataframe from nested xml and generate a csv - Stack Overflow

e-iceblue.com › Tutorials › Python › Spire.XLS-for-Python › Program-Guide › Conversion › convert-xml-to-csv-in-python.html

1 of 2

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,

2 of 2

You would need to preparse all of the CLASS_DEF entries into a dictionary. These can then be looked up when processing the PROD_DEF entries:

import csv
from lxml import etree

inFile = "./newm.xml"
outFile = "./new.csv"

tree = etree.parse(inFile)
class_defs = {}

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("ImpExp"):
    name = impexp.get('Name')

    if impexp.get('Type') == "CLASS_DEF":
        for list_of_object_arrt in impexp.findall('ListOfObject_Arrt'):
            class_defs[name] = [(obj.get('Orig_Id'), obj.get('Attr_Name')) for obj in list_of_object_arrt]

with open(outFile, 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ProductId', 'Product', 'AttributeId', 'Attribute'])

    for impexp in tree.iter("ImpExp"):
        object_num = impexp.get('Object_Num')
        name = impexp.get('Name')

        if impexp.get('Type') == "PROD_DEF":
            for list_of_object_def in impexp.findall('ListOfObject_Def'):
                for obj in list_of_object_def:
                    ancestor_num = obj.get('Ancestor_Num')
                    ancestor_name = obj.get('Ancestor_Name')

            csv_output.writerow([object_num, name] + list(class_defs.get(ancestor_name, [['', '']])[0]))

This would produce new.csv containing:

ProductId,Product,AttributeId,Attribute
2008a,Laptop,6666p,LP_Portable
2987d,Mouse,7010p,O_Portable
5463g,Speaker,,

If you are using Python 3.x, use:

with open(outFile, 'w', newline='') as f_output:

e-iceblue

How to Convert XML to CSV in Python: A Complete Guide

Converting XML to CSV in Python doesn’t have to be painful. With Spire.XLS for Python, you can automate much of the process, including header generation, handling attributes, and flattening nested nodes.

Medium

medium.com › @ayushnandanwar003 › simplifying-data-conversion-a-comprehensive-guide-to-converting-xml-to-csv-in-python-b22c24b02628

Simplifying Data Conversion: A Comprehensive Guide to Converting XML to CSV in Python | by Ayush Nandanwar | Medium

April 26, 2024 - In this comprehensive guide, we’ll delve deeper into the process of converting XML data into CSV format using Python.

SysTools Group

systoolsgroup.com › home › how to convert xml to csv file? 5 easy methods

Convert XML to CSV Format in Bulk Using the Best Five Methods

October 8, 2025 - Do you also have nested and large XML files, and you are confused about how to convert them into CSV format? If it sounds right and accurate, then you have opened the right blog. Users can easily convert XML to CSV format using Notepad++, Python Script, Microsoft Excel, and an alternate automatic ...

Teleport

goteleport.com › resources › tools › xml-to-csv-converter

XML to CSV Converter | Instantly Transform XML to CSV | Teleport

<records> <record> <field1>Value1</field1> <field2>Value2</field2> <nested> <subfield1>SubValue1</subfield1> <subfield2>SubValue2</subfield2> </nested> </record> <record> <field1>Value3</field1> <field2>Value4</field2> <nested> <subfield1>SubValue3</subfield1> <subfield2>SubValue4</subfield2> </nested> </record> </records> The corresponding CSV format would look like this: As you can see, manually converting XML to CSV can quickly become challenging and prone to mistakes. Let's take a look at an automated way to simplify this process and guarantee greater accuracy in your data handling. Here’s a straightforward Python script demonstrating how to automate the conversion process:

Converting Nested XML to CSV in Python Dynamically: A Comprehensive Guide! - YouTube

youtube.com › watch

01:53

Learn how to dynamically convert nested XML data to CSV in Python, handling arrays and extra tags without breaking a sweat!---This video is based on the ques...

Published March 22, 2025

Views 10

stackoverflow.com › questions › 77683586 › how-to-convert-nested-xml-to-csv-using-python

pandas - How to convert nested xml to csv using python - Stack Overflow

saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python

1 of 2

We can use pd.json_normalize() to flatten the dictionary created from the XML. However, since records reside under two different keys: tag_2 and tag_7, we need to loop over those particular tags to get all the records, then concatenate the dataframes.

import pandas as pd
import xmltodict

with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
    str_xml = xml_fh.read()

dict_xml = xmltodict.parse(str_xml)

df = pd.concat(
    [
        pd.json_normalize(
            dict_xml, 
            record_path=['tag_1', tag, 'date', 'data'],            # path to record list
            meta=[['tag_1', tag, 'date', '@value']])               # path to date
        .pipe(lambda x: x.rename(columns={x.columns[-1]: 'date'})) # rename date column
        .assign(tag_1='tag_1', tag_2=tag, data='data')             # add meta columns
        for tag in ('tag_2', 'tag_7')                              # loop over tags
    ]
)[['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6']]
df.to_csv('file_01.csv', index=False)

This creates the following CSV file:

tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173

Perhaps a more maintainable way is to normalize the relevant sub-dictionary under each level 2 key. Note that in the code below, the record_path and meta paths are no longer lists.

def flatten_dict(dict_xml, level_2_tags):
    df = (
        pd.concat([
            pd.json_normalize(dict_xml['tag_1'][tag]['date'], 'data', '@value')
            .assign(tag_2=tag)
            for tag in level_2_tags
        ])
        .rename(columns={'@value': 'date'})
        .assign(tag_1='tag_1', data='data')
        .get(['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6'])
    )
    return df

# test run
flatten_dict(dict_xml, ['tag_2'])           # when there is only tag_2 in level=2

flatten_dict(dict_xml, ['tag_2', 'tag_7'])  # when there are 2 tags in level=2

2 of 2

Given the custom format, it looks like the best option is to use a nested list comprehension:

df = pd.DataFrame([{'tag_1': k1, 'tag_2': k2, k3: d3['@value'], **d4}
                   for k1, d1 in dict_xml.items()
                   for k2, d2 in d1.items()
                   for k3, d3 in d2.items()
                   for d4 in d3['data']])

Output:

   tag_1  tag_2        date  tag_3    tag_4              tag_5   tag_6
0  tag_1  tag_2  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
1  tag_1  tag_2  06-30-2023  val_3  val_4_2            val_5_1  -0.173
2  tag_1  tag_7  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
3  tag_1  tag_7  06-30-2023  val_3  val_4_2            val_5_1  -0.173

CSV output:

# df.to_csv('file_01.csv', index=False)

tag_1,tag_2,date,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,val_3,val_4_2,val_5_1,-0.173

Saturn Cloud

Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog

February 9, 2026 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.

stackoverflow.com › questions › 66986927 › python-flatten-xml-to-csv-with-nested-child-tags

Python : Flatten xml to csv with nested child tags - Stack Overflow