nested xml to csv python pandas

Create a dataframe from nested xml and generate a csv

stackoverflow.com › questions › 48941023 › create-a-dataframe-from-nested-xml-and-generate-a-csv

Consider XSLT, the special purpose language designed to transform XML files and can directly convert XML to CSV (i.e., text file) without the pandas dataframe intermediary. Python's third-party module lxml (which you are already using) can run XSLT 1.0 scripts and do so without for loops or if logic. However, due to the complex alignment of product and attributes, some longer XPath searches are used with XSLT.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="no" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/PropertySet">
      <xsl:text>ProductId,Product,AttributeId,Attribute&#xa;</xsl:text>
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="PropertySet|Message|ListOf_Class_Def|ListOf_Prod_Def|ImpExp">
      <xsl:apply-templates select="*"/>
  </xsl:template>

  <xsl:template match="ListOfObject_Arrt">
    <xsl:apply-templates select="Object_Arrt"/>
    <xsl:if test="name(*) != 'Object_Arrt' and preceding-sibling::ListOfObject_Def/Object_Def/@Ancestor_Name = ''">
       <xsl:value-of select="concat(ancestor::ImpExp/@Name, $delimiter,
                                    ancestor::ImpExp/@Object_Num, $delimiter,
                                    '', $delimiter,
                                    '')"/><xsl:text>&#xa;</xsl:text>
    </xsl:if>   
  </xsl:template>

  <xsl:template match="Object_Arrt">
    <xsl:variable name="attrName" select="ancestor::ImpExp/@Name"/>
    <xsl:value-of select="concat(/PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Name, $delimiter,

                                 /PropertySet/PropertySet/Message[@IntObjectName='Prod Def']/ListOf_Prod_Def/
                                 ImpExp[ListOfObject_Def/Object_Def/@Ancestor_Name = $attrName]/@Object_Num, $delimiter,

                                 @Orig_Id, $delimiter,
                                 @Attr_Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# RUN TRANSFORMATION
transform = et.XSLT(xsl)    
result = transform(xml)

# OUTPUT TO FILE
with open('Output.csv', 'wb') as f:
    f.write(result)

Output

ProductId,Product,AttributeId,Attribute
Laptop,2008a,6666p,LP_Portable
Mouse,2987d,7010p,O_Portable
Mouse,2987d,7012j,O_wireless
Speaker,5463g,,

Answer from Parfait on Stack Overflow

Like Geeks

likegeeks.com › home › python › pandas › export xml to csv using python pandas

Export XML to CSV using Python Pandas

December 16, 2023 - import pandas as pd import xml.etree.ElementTree as ET tree = ET.parse('data.xml') root = tree.getroot() # Extract the data from XML and flatten it into a dictionary data = [] for employee in root.findall('employee'): employee_data = { 'id': employee.find('id').text, 'name': employee.find('name').text, 'email': employee.find('contact/email').text, 'phone': employee.find('contact/phone').text } data.append(employee_data) df = pd.DataFrame.from_records(data) df.to_csv('nested_output.csv', index=False) ... This example focuses on XML files where elements have attributes. ... <products> <product id="p001"> <name>Widget</name> <price>19.99</price> </product> <product id="p002"> <name>Gadget</name> <price>29.99</price> </product> </products> ... import pandas as pd df = pd.read_xml('input.xml', xpath='/products/product') df.to_csv('output.csv', index=False)

Syntax Byte

syntaxbytetutorials.com › home › import xml into pandas and convert to csv

Import XML into Pandas and Convert to CSV - Syntax Byte

December 13, 2023 - Use pandas to convert a nested XML file to a CSV in only three lines of Python.

Videos