convert nested xml to dataframe python

How to read nested xml file with python pandas?

stackoverflow.com › questions › 71361593 › how-to-read-nested-xml-file-with-python-pandas

While XML as a data format can take many forms from flat to deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*.

However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to transform any XML to the needed flat format of data frame. Below stylesheet will restyle the <slike> node for comma-separated text of its children <slika>:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="slike">
     <xsl:copy>
       <xsl:for-each select="*">
         <xsl:value-of select="text()"/>
         <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
         </xsl:if>
       </xsl:for-each>
     </xsl:copy>
    </xsl:template>  
</xsl:stylesheet>

Online Demo

Python

artikal_df = pd.read_xml("my_filename.xml", stylesheet="my_style.xsl") 

# CONVERT COMMA-SEPARATED VALUES TO EMBEDDED LISTS
artikal_df["slike"] = artikal_df["slike"].str.split(',')

# PREFIX PARENT NODE NAME
artikal_df = artikal_df.add_prefix('artikal_')

artikal_df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 12 columns):
#  #   Column               Non-Null Count  Dtype  
# ---  ------               --------------  -----  
#  0   artikal_id           2 non-null      int64  
#  1   artikal_sifra        2 non-null      int64  
#  2   artikal_barKod       2 non-null      int64  
#  3   artikal_naziv        2 non-null      object 
#  4   artikal_kategorija1  2 non-null      object 
#  5   artikal_kategorija2  2 non-null      object 
#  6   artikal_kategorija3  2 non-null      object 
#  7   artikal_vpCena       2 non-null      float64
#  8   artikal_mpCena       2 non-null      float64
#  9   artikal_dostupan     2 non-null      int64  
#  10  artikal_opis         0 non-null      float64
#  11  artikal_slike        2 non-null      object 
# dtypes: float64(3), int64(4), object(5)
# memory usage: 320.0+ bytes

Answer from Parfait on Stack Overflow

GeeksforGeeks

geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml

How to create Pandas DataFrame from nested XML? | GeeksforGeeks

April 28, 2021 - In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file.

YouTube

youtube.com › watch

Transforming Nested XML to Pandas DataFrame - YouTube

12:57

Hello and welcome to this tutorial. In this tutorial, you will learn how to transform XML documents to pandas data frames using Python and the element tree l...

Published October 21, 2023

Discussions

nested xml to dataframe - Data Science Stack Exchange

I am trying to convert the below mentioned sample xml file to a pandas dataframe. I have multiple xml files which I will loop over to add all xml data into a single dataframe once i succeed with this More on datascience.stackexchange.com

datascience.stackexchange.com

August 23, 2022

How to read nested xml file with python pandas? - Stack Overflow

The dataframe that you create will be a nested dataframe, so need to platten this before exporting. see: stackoverflow.com/questions/66272366/… ... I get this error when reading the xml provided: Exception has occurred: XMLSyntaxError Extra content at the end of the document, line 20, column ... More on stackoverflow.com

stackoverflow.com

python - Nested XML to Pandas dataframe - Stack Overflow

I'm trying to create a script to convert nested XML files to a Pandas dataframe. I've found this article https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c, which does a g... More on stackoverflow.com

stackoverflow.com

python - How to create pandas DataFrame from nested xml - Stack Overflow

I am trying to create Pandas dataframe out of XML. The XML looks like this: Shop items. More on stackoverflow.com

stackoverflow.com

April 26, 2019

Videos