nested xml to csv python

stackoverflow.com › questions › 39576683 › convert-deeply-nested-xml-to-csv-in-python

Convert Deeply Nested XML to CSV in Python - Stack Overflow

1 of 3

The lxml library is capable of very powerful XML parsing, and can be used to iterate over an XML tree to search for specific elements.

from lxml import etree

with open(r'path/to/xml', 'r') as xml:
    text = xml.read()
tree = lxml.etree.fromstring(text)
row = ['', '']
for item in tree.iter('hw', 'def'):
    if item.tag == 'hw':
       row[0] = item.text
    elif item.tag == 'def':
       row[1] = item.text

line = ','.join(row)

with open(r'path/to/csv', 'a') as csv:
     csv.write(line + '\n')

2 of 3

How about this:

from xml.dom import minidom

xmldoc = minidom.parse('your.xml')
hw_lst = xmldoc.getElementsByTagName('hw')
defu_lst = xmldoc.getElementsByTagName('def')

with open('your.csv', 'a') as out_file:
    for i in range(len(hw_lst)):
        out_file.write('{0}, {1}\n'.format(hw_lst[i].firstChild.data, defu_lst[i].firstChild.data))

Videos

12:42

Convert XML to CSV in Python | Python Tutorial - YouTube

Convert XML to CSV in Python | Full Source Code | Complete Tutorial ...

October 14, 2021

15:02

Create DataFrame from Nested XML | Spark DataFrame Practical | ...

XML formatter: XML to CSV Python - YouTube

stackoverflow.com › questions › 74194876 › how-can-we-convert-a-nested-xml-to-csv-in-python-dynamically-nested-xml-may-con

How can we convert a nested XML to CSV in Python Dynamically, Nested XML may contain array of values as well? - Stack Overflow

ElementTree is not really the best tool for what I believe you're trying to do. Since you have well-formed, relatively simple xml, try using pandas:

import pandas as pd

#from here, it's just a one liner
pd.read_xml('input.xml',xpath='.//store').to_csv('output.csv',sep=',', index = None, header=True)

and that should get you your csv file.

likegeeks.com › home › python › pandas › export xml to csv using python pandas

Given parsing element values and their corresponding attributes involves a second layer of iteration, consider a nested list/dict comphrehension with dictionary merge. Also, use csv.DictWriter to build CSV via dictionaries:

from csv import DictWriter
import xml.etree.ElementTree as ET

ifilepath = "Input.xml"

tree = ET.parse(ifilepath)
nmsp = {"du": "http://www.dummytest.org"}

data = [
     {
       **{el.tag.split('}')[-1]: (el.text.strip() if el.text is not None else None) for el in d.findall("*")},
       **{f"{el.tag.split('}')[-1]} {k}":v for el in d.findall("*") for k,v in el.attrib.items()},
       **d.attrib
     }     
     for d in tree.findall(".//du:data", namespaces=nmsp)    
]

dkeys = list(data[0].keys())

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=dkeys)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator,country,date,value,unit,obs_status,decimal,indicator id,country id
"various, tests",test again,2021,1234567,,,0,AA.BB,MM
"testing, cases",coverage test,2020,3456223,,,0,XX.YY,DD

While above will add attributes to last columns of CSV. For specific ordering, re-order the dictionaries:

data = [ ... ]

cols = ["indicator id", "indicator", "country id", "country", "date", "value", "unit", "obs_status", "decimal"]

data = [
    {k: d[k] for k in cols} for d in data
]

with open("DummyXMLtoCSV.csv", "w", newline="") as f:
    dw = DictWriter(f, fieldnames=cols)
    dw.writeheader()
    
    dw.writerows(data)

Output

indicator id,indicator,country id,country,date,value,unit,obs_status,decimal
AA.BB,"various, tests",MM,test again,2021,1234567,,,0
XX.YY,"testing, cases",DD,coverage test,2020,3456223,,,0

Like Geeks

Export XML to CSV using Python Pandas

December 16, 2023 - Learn how to convert XML to CSV using Pandas in Python, From handling simple to complex nested XML structures efficiently.

stackoverflow.com › questions › 48941023 › create-a-dataframe-from-nested-xml-and-generate-a-csv

python - Create a dataframe from nested xml and generate a csv - Stack Overflow

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,

You would need to preparse all of the CLASS_DEF entries into a dictionary. These can then be looked up when processing the PROD_DEF entries:

import csv
from lxml import etree

inFile = "./newm.xml"
outFile = "./new.csv"

tree = etree.parse(inFile)
class_defs = {}

# First extract all the CLASS_DEF entries into a dictionary
for impexp in tree.iter("ImpExp"):
    name = impexp.get('Name')

    if impexp.get('Type') == "CLASS_DEF":
        for list_of_object_arrt in impexp.findall('ListOfObject_Arrt'):
            class_defs[name] = [(obj.get('Orig_Id'), obj.get('Attr_Name')) for obj in list_of_object_arrt]

with open(outFile, 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ProductId', 'Product', 'AttributeId', 'Attribute'])

    for impexp in tree.iter("ImpExp"):
        object_num = impexp.get('Object_Num')
        name = impexp.get('Name')

        if impexp.get('Type') == "PROD_DEF":
            for list_of_object_def in impexp.findall('ListOfObject_Def'):
                for obj in list_of_object_def:
                    ancestor_num = obj.get('Ancestor_Num')
                    ancestor_name = obj.get('Ancestor_Name')

            csv_output.writerow([object_num, name] + list(class_defs.get(ancestor_name, [['', '']])[0]))

This would produce new.csv containing:

ProductId,Product,AttributeId,Attribute
2008a,Laptop,6666p,LP_Portable
2987d,Mouse,7010p,O_Portable
5463g,Speaker,,

If you are using Python 3.x, use:

with open(outFile, 'w', newline='') as f_output:

stackoverflow.com › questions › 45472242 › how-can-i-parse-nested-xml-with-the-same-name-of-childs-to-csv

python - How can I parse nested xml (with the same name of childs) to CSV? - Stack Overflow

Since you have a <sensorEvents> tag containing 3 <sensorEvents>, the first <sensorEvents> shadows the children <sensorEvents> in <sensorEvents>.

This means

    for Sensorevents in Item.findall('sensorEvents'):

Will loop only once per

<sensorEvents>
    <sensorEvents>
        <avgSped>48.55647532226298</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>39.53368357145088</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>41.41160105233052</avgSped>
        <completed>true</completed>
    </sensorEvents>
</sensorEvents>

Then

    avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
    row.append(avgSped_)

    completed_ = Sensorevents.find('sensorEvents').find('completed').text

Gets the data for the first tag only.

You should try

for Item in root.findall('item'):
    for root_Sensorevents in Item.findall('sensorEvents'):
        for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...

quora.com › How-do-you-convert-XML-to-CSV-in-Python

You could also consider using the lxml library because with it you can search by xpath expressions which often make for simpler code.

Here, the xpath expression .//sensorEvents/sensorEvents says look for sensorEvents elements anywhere in the document and then look for the sensorEvents elements immediately under these.

Once you have these it's often a simple matter to write expressions for attributes of the elements, as shown.

>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
...     inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
... 
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')

Quora

How to convert XML to CSV in Python - Quora

Answer (1 of 4): In a strict sense? You don’t. CSV is a format (if it can even be called that!) for encoding row-based data. XML is a format for encoding tree-based data. One expects all entries to follow a simple, “all of these entries have the same fields, and a value in those fields”, ...

Convert an XML File to CSV with Python - Supports Nested XML - YouTube

youtube.com › watch

09:37

In this video, I show you how to use Python and pandas to convert an XML file to CSV. Nested XML is also supported by using a stylesheet to adjust the file t...

Published May 24, 2018

Find elsewhere

Google Bing Mojeek

GeeksforGeeks

geeksforgeeks.org › python › convert-xml-to-csv-in-python

Convert XML to CSV in Python - GeeksforGeeks

July 23, 2025 - We used ElementTree to parse and navigate through the XML structure. Data from each record was collected into a list of dictionaries. Finally, we used pandas to create a CSV file from that structured data. To learn about the pandas module in depth, refer to: Python Pandas Tutorial

Medium

medium.com › @meiyee715 › converting-xml-to-csv-python-xml-etree-25fec8e72626

Converting XML to CSV: Python xml.etree | by Amy Leong | Medium

October 14, 2023 - Replace path_to_your_xml_file.xml and path_to_output.csv with your desired paths. The provided script is a basic example, and real-world XML files can vary widely in their structure. Depending on the nature of the XML, you may need to account for attributes, nested elements, and other complexities. The beauty of Python ...

Syntax Byte

syntaxbytetutorials.com › home › import xml into pandas and convert to csv

Import XML into Pandas and Convert to CSV - Syntax Byte

August 21, 2024 - Use pandas to convert a nested XML file to a CSV in only three lines of Python.

Saturn Cloud

saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python

Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog

December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.

Python.org

discuss.python.org › python help

Convert xml to excel/csv - Python Help - Discussions on Python.org

October 15, 2022 - Please help me in converting XML file into excel/csv. Thank you in advance.

Medium

medium.com › @ayushnandanwar003 › simplifying-data-conversion-a-comprehensive-guide-to-converting-xml-to-csv-in-python-b22c24b02628

Simplifying Data Conversion: A Comprehensive Guide to Converting XML to CSV in Python | by Ayush Nandanwar | Medium

April 26, 2024 - In this comprehensive guide, we’ll delve deeper into the process of converting XML data into CSV format using Python.

Converting Nested XML to CSV in Python Dynamically: A Comprehensive Guide! - YouTube

youtube.com › watch

01:53

Learn how to dynamically convert nested XML data to CSV in Python, handling arrays and extra tags without breaking a sweat!---This video is based on the ques...

Published November 13, 2024

Views 10

stackoverflow.com › questions › 77683586 › how-to-convert-nested-xml-to-csv-using-python

pandas - How to convert nested xml to csv using python - Stack Overflow

We can use pd.json_normalize() to flatten the dictionary created from the XML. However, since records reside under two different keys: tag_2 and tag_7, we need to loop over those particular tags to get all the records, then concatenate the dataframes.

import pandas as pd
import xmltodict

with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
    str_xml = xml_fh.read()

dict_xml = xmltodict.parse(str_xml)

df = pd.concat(
    [
        pd.json_normalize(
            dict_xml, 
            record_path=['tag_1', tag, 'date', 'data'],            # path to record list
            meta=[['tag_1', tag, 'date', '@value']])               # path to date
        .pipe(lambda x: x.rename(columns={x.columns[-1]: 'date'})) # rename date column
        .assign(tag_1='tag_1', tag_2=tag, data='data')             # add meta columns
        for tag in ('tag_2', 'tag_7')                              # loop over tags
    ]
)[['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6']]
df.to_csv('file_01.csv', index=False)

This creates the following CSV file:

tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173

Perhaps a more maintainable way is to normalize the relevant sub-dictionary under each level 2 key. Note that in the code below, the record_path and meta paths are no longer lists.

def flatten_dict(dict_xml, level_2_tags):
    df = (
        pd.concat([
            pd.json_normalize(dict_xml['tag_1'][tag]['date'], 'data', '@value')
            .assign(tag_2=tag)
            for tag in level_2_tags
        ])
        .rename(columns={'@value': 'date'})
        .assign(tag_1='tag_1', data='data')
        .get(['tag_1', 'tag_2', 'date', 'data', 'tag_3', 'tag_4', 'tag_5', 'tag_6'])
    )
    return df

# test run
flatten_dict(dict_xml, ['tag_2'])           # when there is only tag_2 in level=2

flatten_dict(dict_xml, ['tag_2', 'tag_7'])  # when there are 2 tags in level=2

Given the custom format, it looks like the best option is to use a nested list comprehension:

df = pd.DataFrame([{'tag_1': k1, 'tag_2': k2, k3: d3['@value'], **d4}
                   for k1, d1 in dict_xml.items()
                   for k2, d2 in d1.items()
                   for k3, d3 in d2.items()
                   for d4 in d3['data']])

Output:

   tag_1  tag_2        date  tag_3    tag_4              tag_5   tag_6
0  tag_1  tag_2  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
1  tag_1  tag_2  06-30-2023  val_3  val_4_2            val_5_1  -0.173
2  tag_1  tag_7  06-30-2023  val_3    val_4  val_5_1 & val_5_2  -0.157
3  tag_1  tag_7  06-30-2023  val_3  val_4_2            val_5_1  -0.173

CSV output:

# df.to_csv('file_01.csv', index=False)

tag_1,tag_2,date,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,val_3,val_4_2,val_5_1,-0.173

stackoverflow.com › questions › 44760032 › import-huge-nested-xml-files-into-python-and-convert-them-to-csv

Import huge nested XML files into Python and convert them to CSV - Stack Overflow

Just extract the first few lines of your huge files into new files (and work with those to develop your solution): you can do that from python, or from the shell - e.g.

stackoverflow.com › questions › 66986927 › python-flatten-xml-to-csv-with-nested-child-tags

Python : Flatten xml to csv with nested child tags - Stack Overflow