python nested xml to csv

stackoverflow.com › questions › 39576683 › convert-deeply-nested-xml-to-csv-in-python

Convert Deeply Nested XML to CSV in Python - Stack Overflow

github.com › waheed0332 › xml2csv

1 of 3

The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.

from lxml import etree

with open(r'path/to/xml', 'r') as xml:
    text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
    if item.tag == 'hw':
       row[0] = item.text
    elif item.tag == 'def':
       row[1] = item.text

line = ','.join(row)

with open(r'path/to/csv', 'a') as csv:
     csv.write(line + '\n')

2 of 3

How about this:

from xml.dom import minidom

xmldoc = minidom.parse('your.xml')
hw_lst = xmldoc.getElementsByTagName('hw')
defu_lst = xmldoc.getElementsByTagName('def')

with open('your.csv', 'a') as out_file:
    for i in range(len(hw_lst)):
        out_file.write('{0}, {1}\n'.format(hw_lst[i].firstChild.data, defu_lst[i].firstChild.data))

GitHub

GitHub - waheed0332/xml2csv: Python scripts for processing XML documents and converting to CSV. Also works on nested xml files. · GitHub

Converts XML files into csv file, this script is capable of converting extremely nested xml files. This script utilize power of multiprocessing to convert huge data in less time. Install required libraries using following command before running script. pip install -r requirements.txt · python xml2csv.py -f ./xml-samples/1.xml -csv out.csv

Starred by 23 users

Forked by 7 users

Languages Python

Videos

12:42

YouTube

Convert XML to CSV in Python | Python Tutorial - YouTube

Convert XML to CSV in Python | Full Source Code | Complete Tutorial ...

October 14, 2021

15:02

YouTube

Create DataFrame from Nested XML | Spark DataFrame Practical | ...

XML formatter: XML to CSV Python - YouTube

stackoverflow.com › questions › 74194876 › how-can-we-convert-a-nested-xml-to-csv-in-python-dynamically-nested-xml-may-con

How can we convert a nested XML to CSV in Python Dynamically, Nested XML may contain array of values as well? - Stack Overflow

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

likegeeks.com › home › python › pandas › export xml to csv using python pandas

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0

Like Geeks

Export XML to CSV using Python Pandas

December 16, 2023 - Learn how to convert XML to CSV using Pandas in Python, From handling simple to complex nested XML structures efficiently.

YouTube

youtube.com › watch

Convert an XML File to CSV with Python - Supports Nested XML - YouTube

09:37

In this video, I show you how to use Python and pandas to convert an XML file to CSV. Nested XML is also supported by using a stylesheet to adjust the file t...

Published December 12, 2023

stackoverflow.com › questions › 45472242 › how-can-i-parse-nested-xml-with-the-same-name-of-childs-to-csv

python - How can I parse nested xml (with the same name of childs) to CSV? - Stack Overflow

Since you have a <sensorEvents> tag containing 3 <sensorEvents>, the first <sensorEvents> shadows the children <sensorEvents> in <sensorEvents>.

This means

    for Sensorevents in Item.findall('sensorEvents'):

Will loop only once per

<sensorEvents>
    <sensorEvents>
        <avgSped>48.55647532226298</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>39.53368357145088</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>41.41160105233052</avgSped>
        <completed>true</completed>
    </sensorEvents>
</sensorEvents>

Then

    avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
    row.append(avgSped_)

    completed_ = Sensorevents.find('sensorEvents').find('completed').text

Gets the data for the first tag only.

You should try

for Item in root.findall('item'):
    for root_Sensorevents in Item.findall('sensorEvents'):
        for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...

You could also consider using the lxml library because with it you can search by xpath expressions which often make for simpler code.

Here, the xpath expression .//sensorEvents/sensorEvents says look for sensorEvents elements anywhere in the document and then look for the sensorEvents elements immediately under these.

Once you have these it's often a simple matter to write expressions for attributes of the elements, as shown.

>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
...     inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
... 
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')

stackoverflow.com › questions › 48941023 › create-a-dataframe-from-nested-xml-and-generate-a-csv

python - Create a dataframe from nested xml and generate a csv - Stack Overflow

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,

geeksforgeeks.org › python › convert-xml-to-csv-in-python

You would need to preparse all of the CLASS_DEF entries into a dictionary. These can then be looked up when processing the PROD_DEF entries:

import csv
from lxml import etree

inFile = "./newm.xml"
outFile = "./new.csv"

tree = etree.parse(inFile)
class_defs = {}

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("ImpExp"):
    name = impexp.get('Name')

    if impexp.get('Type') == "CLASS_DEF":
        for list_of_object_arrt in impexp.findall('ListOfObject_Arrt'):
            class_defs[name] = [(obj.get('Orig_Id'), obj.get('Attr_Name')) for obj in list_of_object_arrt]

with open(outFile, 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ProductId', 'Product', 'AttributeId', 'Attribute'])

    for impexp in tree.iter("ImpExp"):
        object_num = impexp.get('Object_Num')
        name = impexp.get('Name')

        if impexp.get('Type') == "PROD_DEF":
            for list_of_object_def in impexp.findall('ListOfObject_Def'):
                for obj in list_of_object_def:
                    ancestor_num = obj.get('Ancestor_Num')
                    ancestor_name = obj.get('Ancestor_Name')

            csv_output.writerow([object_num, name] + list(class_defs.get(ancestor_name, [['', '']])[0]))

This would produce new.csv containing:

ProductId,Product,AttributeId,Attribute
2008a,Laptop,6666p,LP_Portable
2987d,Mouse,7010p,O_Portable
5463g,Speaker,,

If you are using Python 3.x, use:

with open(outFile, 'w', newline='') as f_output:

GeeksforGeeks

Convert XML to CSV in Python - GeeksforGeeks

July 23, 2025 - We used ElementTree to parse and navigate through the XML structure. Data from each record was collected into a list of dictionaries. Finally, we used pandas to create a CSV file from that structured data. To learn about the pandas module in depth, refer to: Python Pandas Tutorial

Find elsewhere

Google Bing Mojeek

Syntax Byte

syntaxbytetutorials.com › home › import xml into pandas and convert to csv

Import XML into Pandas and Convert to CSV - Syntax Byte

December 13, 2023 - Use pandas to convert a nested XML file to a CSV in only three lines of Python.

Medium

medium.com › @meiyee715 › converting-xml-to-csv-python-xml-etree-25fec8e72626

Converting XML to CSV: Python xml.etree | by Amy Leong | Medium

October 14, 2023 - Replace path_to_your_xml_file.xml and path_to_output.csv with your desired paths. The provided script is a basic example, and real-world XML files can vary widely in their structure. Depending on the nature of the XML, you may need to account for attributes, nested elements, and other complexities. The beauty of Python ...

Python.org

discuss.python.org › python help

Convert xml to excel/csv - Python Help - Discussions on Python.org

October 15, 2022 - Please help me in converting XML file into excel/csv. Thank you in advance.

Saturn Cloud

saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python

Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog

December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.

Quora

quora.com › How-do-you-convert-XML-to-CSV-in-Python

How to convert XML to CSV in Python - Quora

Answer (1 of 4): In a strict sense? You don’t. CSV is a format (if it can even be called that!) for encoding row-based data. XML is a format for encoding tree-based data. One expects all entries to follow a simple, “all of these entries have the same fields, and a value in those fields”, ...

stackoverflow.com › questions › 77683586 › how-to-convert-nested-xml-to-csv-using-python

pandas - How to convert nested xml to csv using python - Stack Overflow

We can use pd.json_normalize() to flatten the dictionary created from the XML. However, since records reside under two different keys: tag_2 and tag_7, we need to loop over those particular tags to get all the records, then concatenate the dataframes.

import pandas as pd
import xmltodict

with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
    str_xml = xml_fh.read()

dict_xml = xmltodict.parse(str_xml)

df = pd.concat(
    [
        pd.json_normalize(
            dict_xml, 
            record_path=['tag_1', tag, 'date', 'data'],            # path to record list
            meta=[['tag_1', tag, 'date', '@value']])               # path to date
        .pipe(lambda x: x.rename(columns={x.columns[-1]: 'date'})) # rename date column
        .assign(tag_1='tag_1', tag_2=tag, data='data')             # add meta columns
        for tag in ('tag_2', 'tag_7')                              # loop over tags
    ]
)[['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6']]
df.to_csv('file_01.csv', index=False)

This creates the following CSV file:

tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173

Perhaps a more maintainable way is to normalize the relevant sub-dictionary under each level 2 key. Note that in the code below, the record_path and meta paths are no longer lists.

def flatten_dict(dict_xml, level_2_tags):
    df = (
        pd.concat([
            pd.json_normalize(dict_xml['tag_1'][tag]['date'], 'data', '@value')
            .assign(tag_2=tag)
            for tag in level_2_tags
        ])
        .rename(columns={'@value': 'date'})
        .assign(tag_1='tag_1', data='data')
        .get(['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6'])
    )
    return df

# test run
flatten_dict(dict_xml, ['tag_2'])           # when there is only tag_2 in level=2

flatten_dict(dict_xml, ['tag_2', 'tag_7'])  # when there are 2 tags in level=2

cdn.e-iceblue.com › Tutorials › Python › Spire.XLS-for-Python › Program-Guide › Conversion › convert-xml-to-csv-in-python.html

Given the custom format, it looks like the best option is to use a nested list comprehension:

df = pd.DataFrame([{'tag_1': k1, 'tag_2': k2, k3: d3['@value'], **d4}
                   for k1, d1 in dict_xml.items()
                   for k2, d2 in d1.items()
                   for k3, d3 in d2.items()
                   for d4 in d3['data']])

Output:

   tag_1  tag_2        date  tag_3    tag_4              tag_5   tag_6
0  tag_1  tag_2  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
1  tag_1  tag_2  06-30-2023  val_3  val_4_2            val_5_1  -0.173
2  tag_1  tag_7  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
3  tag_1  tag_7  06-30-2023  val_3  val_4_2            val_5_1  -0.173

CSV output:

# df.to_csv('file_01.csv', index=False)

tag_1,tag_2,date,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,val_3,val_4_2,val_5_1,-0.173

E-iceblue

How to Convert XML to CSV in Python: A Complete Guide

Converting XML to CSV in Python doesn’t have to be painful. With Spire.XLS for Python, you can automate much of the process, including header generation, handling attributes, and flattening nested nodes.

stackoverflow.com › questions › 64934431 › converting-nested-xml-to-csv-using-python-for-any-xml-file

Converting Nested XML to csv using python for any XML file - Stack Overflow

I'm trying to convert a nested XML file into csv format and trying to develop a code in a way that it should be able to read any XML file. So whenever the structure of XML file changes, the same py...

stackoverflow.com › questions › 51042841 › python-create-csv-from-xml-with-different-nested-elements

Python: create csv from xml with different nested elements - Stack Overflow

medium.com › @ayushnandanwar003 › simplifying-data-conversion-a-comprehensive-guide-to-converting-xml-to-csv-in-python-b22c24b02628

1 of 1

Since there's always going to be at least one TaxTotal element, I would create a new csv row for each one and go back up the tree for the preceding values.

Here's an example using lxml. I added a function to make it easier to handle empty values, but any additional formatting of values I'll leave up to you.

Python 3.6

from lxml import etree
import csv


def get_value(target_tree, xpath, namespaces):
    try:
        return target_tree.xpath(xpath, namespaces=namespaces)[0].text
    except IndexError:
        return ""


tree = etree.parse("input.xml")

ns = {"cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
      "cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
      "i2": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"}

with open("output.csv", "w") as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=";", lineterminator="\n", quoting=csv.QUOTE_MINIMAL)
    # Header
    csvwriter.writerow(["ID", "/InvoiceLine/ID", "/InvoiceLine/InvoicedQuantity", "/InvoiceLine/LineExtensionAmount",
                        "/InvoiceLine/TaxTotal/TaxAmount", "/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name"])
    for tax_total in tree.xpath("//cac:TaxTotal", namespaces=ns):
        csvwriter.writerow([get_value(tax_total, "/i2:Invoice/cbc:ID", ns),
                            get_value(tax_total, "../cbc:ID", ns),
                            get_value(tax_total, "../cbc:InvoicedQuantity", ns),
                            get_value(tax_total, "../cbc:LineExtensionAmount", ns),
                            get_value(tax_total, "cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxableAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:Percent", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:Name", ns)])

Output (output.csv)

ID;/InvoiceLine/ID;/InvoiceLine/InvoicedQuantity;/InvoiceLine/LineExtensionAmount;/InvoiceLine/TaxTotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID;/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name
102165444;1.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;138.24;142.39;7.20;3645;;140;Afgift
102165444;2.0000;1.0000;142.3900;35.60;142.39;35.60;StandardRated;25;63;Moms

Medium

Simplifying Data Conversion: A Comprehensive Guide to Converting XML to CSV in Python | by Ayush Nandanwar | Medium

April 26, 2024 - In this comprehensive guide, we’ll delve deeper into the process of converting XML data into CSV format using Python.