Brave Search

How to read nested xml file with python pandas?

stackoverflow.com › questions › 71361593 › how-to-read-nested-xml-file-with-python-pandas

While XML as a data format can take many forms from flat to deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*.

However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to transform any XML to the needed flat format of data frame. Below stylesheet will restyle the <slike> node for comma-separated text of its children <slika>:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="slike">
     <xsl:copy>
       <xsl:for-each select="*">
         <xsl:value-of select="text()"/>
         <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
         </xsl:if>
       </xsl:for-each>
     </xsl:copy>
    </xsl:template>  
</xsl:stylesheet>

Online Demo

Python

artikal_df = pd.read_xml("my_filename.xml", stylesheet="my_style.xsl") 

# CONVERT COMMA-SEPARATED VALUES TO EMBEDDED LISTS
artikal_df["slike"] = artikal_df["slike"].str.split(',')

# PREFIX PARENT NODE NAME
artikal_df = artikal_df.add_prefix('artikal_')

artikal_df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 12 columns):
#  #   Column               Non-Null Count  Dtype  
# ---  ------               --------------  -----  
#  0   artikal_id           2 non-null      int64  
#  1   artikal_sifra        2 non-null      int64  
#  2   artikal_barKod       2 non-null      int64  
#  3   artikal_naziv        2 non-null      object 
#  4   artikal_kategorija1  2 non-null      object 
#  5   artikal_kategorija2  2 non-null      object 
#  6   artikal_kategorija3  2 non-null      object 
#  7   artikal_vpCena       2 non-null      float64
#  8   artikal_mpCena       2 non-null      float64
#  9   artikal_dostupan     2 non-null      int64  
#  10  artikal_opis         0 non-null      float64
#  11  artikal_slike        2 non-null      object 
# dtypes: float64(3), int64(4), object(5)
# memory usage: 320.0+ bytes

Answer from Parfait on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 71361593 › how-to-read-nested-xml-file-with-python-pandas

How to read nested xml file with python pandas? - Stack Overflow

Top answer

1 of 2

5

While XML as a data format can take many forms from flat to deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*.

However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to transform any XML to the needed flat format of data frame. Below stylesheet will restyle the <slike> node for comma-separated text of its children <slika>:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="slike">
     <xsl:copy>
       <xsl:for-each select="*">
         <xsl:value-of select="text()"/>
         <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
         </xsl:if>
       </xsl:for-each>
     </xsl:copy>
    </xsl:template>  
</xsl:stylesheet>

Online Demo

Python

artikal_df = pd.read_xml("my_filename.xml", stylesheet="my_style.xsl") 

# CONVERT COMMA-SEPARATED VALUES TO EMBEDDED LISTS
artikal_df["slike"] = artikal_df["slike"].str.split(',')

# PREFIX PARENT NODE NAME
artikal_df = artikal_df.add_prefix('artikal_')

artikal_df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 12 columns):
#  #   Column               Non-Null Count  Dtype  
# ---  ------               --------------  -----  
#  0   artikal_id           2 non-null      int64  
#  1   artikal_sifra        2 non-null      int64  
#  2   artikal_barKod       2 non-null      int64  
#  3   artikal_naziv        2 non-null      object 
#  4   artikal_kategorija1  2 non-null      object 
#  5   artikal_kategorija2  2 non-null      object 
#  6   artikal_kategorija3  2 non-null      object 
#  7   artikal_vpCena       2 non-null      float64
#  8   artikal_mpCena       2 non-null      float64
#  9   artikal_dostupan     2 non-null      int64  
#  10  artikal_opis         0 non-null      float64
#  11  artikal_slike        2 non-null      object 
# dtypes: float64(3), int64(4), object(5)
# memory usage: 320.0+ bytes

2 of 2

0

You start by reading the xml file and also making a placeholder file for you to write the output in a csv format (or any other text format - you might have to tweak the code a bit).

Then you specify the names of columns in your final dataframe (after you have parsed the xml file). But this information is already in your xml file anyways, so you just to make sure you understand the contents.

Lastly, loop over the entries and find the keywords (column names) to read and write to the csv.

Once done, you can read the csv using pd.read_csv('output.csv').

import xml.etree.ElementTree as ET
import csv

# Load and parse the XML file
tree = ET.parse('your_xml_file.xml')
root = tree.getroot()

# Define the CSV file and writer
csv_file = open('output.csv', 'w', newline='', encoding='utf-8')
csv_writer = csv.writer(csv_file)

# Write header row
header = ['column1', 'column2', 'column3', 'column4', 'column5']
csv_writer.writerow(header)

# Extract data and write to CSV
for id in root.findall('.//main_identifier'):
    column1_text = id.find('column1').text if id.find('column') is not None else ''
    column2_text = id.find('.//column2').text if id.find('.//column2') is not None else ''
    column3_text = id.find('.//column3').text if id.find('.//column3') is not None else ''
    column4 = id.find('.//column4').text if id.find('.//column4') is not None else ''
    column5_text = id.find('.//column5').text if id.find('.//column5') is not None else ''
    
    # Write data to CSV
    csv_writer.writerow([column1_text, column2_text, column3_text, column4_text, column5_text])

# Close the CSV file
csv_file.close()

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_xml.html

pandas.read_xml — pandas documentation - PyData |

pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#

Discussions

nested xml to dataframe - Data Science Stack Exchange

I am trying to convert the below mentioned sample xml file to a pandas dataframe. I have multiple xml files which I will loop over to add all xml data into a single dataframe once i succeed with this More on datascience.stackexchange.com

datascience.stackexchange.com

August 23, 2022

python - Parsing nested children nodes using pandas.read_xml - Stack Overflow

I would like to import an xml with nested structure into a pandas dataframe. I include a sample xml More on stackoverflow.com

stackoverflow.com

BUG: iterparse on read_xml overwrites nested child elements

Pandas version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. I have confirmed this bug exists on the main br... More on github.com

github.com

3

February 5, 2023

Pandas dataframe to nested xml

xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items) More on reddit.com

r/learnpython

3

1

January 3, 2021

Videos

05:45

YouTube

Transforming nested xml to pandas dataframe - YouTube

Convert an XML File to CSV with Python - Supports Nested XML - YouTube

December 12, 2023

05:55

YouTube

Read Nested XML File - YouTube

pandas.pydata.org › pandas-docs › version › 1.4 › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 1.4.4 documentation

pandas.read_xml(path_or_buffer, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, encoding='utf-8', parser='lxml', stylesheet=None, compression='infer', storage_options=None)[source]¶

pandas

pandas.pydata.org › pandas-docs › dev › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 3.0.0rc1+103.gaf9e3f0ca6 documentation

pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#

GeeksforGeeks

geeksforgeeks.org › python › how-to-create-pandas-dataframe-from-nested-xml

How to create Pandas DataFrame from nested XML? - GeeksforGeeks

July 23, 2025 - Parse or read the XML file using ElementTree.parse( ) function and get the root element. Iterate through the root node to get the child nodes attributes 'SL NO' (here) and extract the text values of each attribute (here foodItem, price, quantity, ...

PyPI

pypi.org › project › pandas-read-xml

pandas-read-xml

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Stack Exchange

datascience.stackexchange.com › questions › 113782 › nested-xml-to-dataframe

nested xml to dataframe - Data Science Stack Exchange

August 23, 2022 - import pandas as pd import xml.etree.ElementTree as ETree xml_data = open('path/xml_file.xml', 'r').read() def xml2df(xml_data): root = ETree.XML(xml_data) all_records = [] for i, child in enumerate(root): record = {} for subchild in child: record[subchild.tag] = subchild.text all_records.append(record) return pd.DataFrame(all_records) df = xml2df(xml_data) df.shape df.head()

Find elsewhere

Google Bing Mojeek

Medium

medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7

Understanding pandas XML and Its Structure | by why amit | Medium

April 12, 2025 - How does pandas handle nested XML data? Pandas can read nested XML data, but you may need to preprocess it into a flat structure before importing it into a DataFrame.

Stack Abuse

stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas

Reading and Writing XML Files in Python with Pandas

August 21, 2024 - Like we've done before, we read the XML contents into a variable. We give this data in the parse() method which returns a dictionary of the XML data. It will be a nested dictionary that has elements and sub-elements of the XML file. We can loop through the elements and write them into a data ...

Pandas

pandas.pydata.org › pandas-docs › version › 2.0 › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 2.0.3 documentation

pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=_NoDefault.no_default)[source]#

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 2.2.2 documentation - PyData |

pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#

TutorialsPoint

tutorialspoint.com › python_pandas › python_pandas_parsing_xml_file.htm

Python Pandas - Parsing XML File

This example shows how to parse a nested XML structure representing a bookstore. Each <book> node has child elements like title, author, year, and price. By using the xpath parameter we can easily locate and extract these <book> nodes and their contents into a DataFrame. import pandas as pd from io import StringIO # Create a String representing XML data xml = """<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="cooking"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="children"> <title lang="en">Harry Potter</title> <author>J K.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 3.0.0.dev0+2687.g00a7c41157 documentation

pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#

Stack Overflow

stackoverflow.com › questions › 76848009 › parsing-nested-children-nodes-using-pandas-read-xml

python - Parsing nested children nodes using pandas.read_xml - Stack Overflow