While XML as a data format can take many forms from flat to deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*.

However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to transform any XML to the needed flat format of data frame. Below stylesheet will restyle the <slike> node for comma-separated text of its children <slika>:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="slike">
     <xsl:copy>
       <xsl:for-each select="*">
         <xsl:value-of select="text()"/>
         <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
         </xsl:if>
       </xsl:for-each>
     </xsl:copy>
    </xsl:template>  
</xsl:stylesheet>

Online Demo

Python

artikal_df = pd.read_xml("my_filename.xml", stylesheet="my_style.xsl") 

# CONVERT COMMA-SEPARATED VALUES TO EMBEDDED LISTS
artikal_df["slike"] = artikal_df["slike"].str.split(',')

# PREFIX PARENT NODE NAME
artikal_df = artikal_df.add_prefix('artikal_')

artikal_df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 12 columns):
#  #   Column               Non-Null Count  Dtype  
# ---  ------               --------------  -----  
#  0   artikal_id           2 non-null      int64  
#  1   artikal_sifra        2 non-null      int64  
#  2   artikal_barKod       2 non-null      int64  
#  3   artikal_naziv        2 non-null      object 
#  4   artikal_kategorija1  2 non-null      object 
#  5   artikal_kategorija2  2 non-null      object 
#  6   artikal_kategorija3  2 non-null      object 
#  7   artikal_vpCena       2 non-null      float64
#  8   artikal_mpCena       2 non-null      float64
#  9   artikal_dostupan     2 non-null      int64  
#  10  artikal_opis         0 non-null      float64
#  11  artikal_slike        2 non-null      object 
# dtypes: float64(3), int64(4), object(5)
# memory usage: 320.0+ bytes
Answer from Parfait on Stack Overflow
Top answer
1 of 2
5

While XML as a data format can take many forms from flat to deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*.

However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to transform any XML to the needed flat format of data frame. Below stylesheet will restyle the <slike> node for comma-separated text of its children <slika>:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="slike">
     <xsl:copy>
       <xsl:for-each select="*">
         <xsl:value-of select="text()"/>
         <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
         </xsl:if>
       </xsl:for-each>
     </xsl:copy>
    </xsl:template>  
</xsl:stylesheet>

Online Demo

Python

artikal_df = pd.read_xml("my_filename.xml", stylesheet="my_style.xsl") 

# CONVERT COMMA-SEPARATED VALUES TO EMBEDDED LISTS
artikal_df["slike"] = artikal_df["slike"].str.split(',')

# PREFIX PARENT NODE NAME
artikal_df = artikal_df.add_prefix('artikal_')

artikal_df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 2 entries, 0 to 1
# Data columns (total 12 columns):
#  #   Column               Non-Null Count  Dtype  
# ---  ------               --------------  -----  
#  0   artikal_id           2 non-null      int64  
#  1   artikal_sifra        2 non-null      int64  
#  2   artikal_barKod       2 non-null      int64  
#  3   artikal_naziv        2 non-null      object 
#  4   artikal_kategorija1  2 non-null      object 
#  5   artikal_kategorija2  2 non-null      object 
#  6   artikal_kategorija3  2 non-null      object 
#  7   artikal_vpCena       2 non-null      float64
#  8   artikal_mpCena       2 non-null      float64
#  9   artikal_dostupan     2 non-null      int64  
#  10  artikal_opis         0 non-null      float64
#  11  artikal_slike        2 non-null      object 
# dtypes: float64(3), int64(4), object(5)
# memory usage: 320.0+ bytes
2 of 2
0

You start by reading the xml file and also making a placeholder file for you to write the output in a csv format (or any other text format - you might have to tweak the code a bit).

Then you specify the names of columns in your final dataframe (after you have parsed the xml file). But this information is already in your xml file anyways, so you just to make sure you understand the contents.

Lastly, loop over the entries and find the keywords (column names) to read and write to the csv.

Once done, you can read the csv using pd.read_csv('output.csv').

import xml.etree.ElementTree as ET
import csv

# Load and parse the XML file
tree = ET.parse('your_xml_file.xml')
root = tree.getroot()

# Define the CSV file and writer
csv_file = open('output.csv', 'w', newline='', encoding='utf-8')
csv_writer = csv.writer(csv_file)

# Write header row
header = ['column1', 'column2', 'column3', 'column4', 'column5']
csv_writer.writerow(header)

# Extract data and write to CSV
for id in root.findall('.//main_identifier'):
    column1_text = id.find('column1').text if id.find('column') is not None else ''
    column2_text = id.find('.//column2').text if id.find('.//column2') is not None else ''
    column3_text = id.find('.//column3').text if id.find('.//column3') is not None else ''
    column4 = id.find('.//column4').text if id.find('.//column4') is not None else ''
    column5_text = id.find('.//column5').text if id.find('.//column5') is not None else ''
    
    # Write data to CSV
    csv_writer.writerow([column1_text, column2_text, column3_text, column4_text, column5_text])

# Close the CSV file
csv_file.close()
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas documentation - PyData |
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#
Discussions

nested xml to dataframe - Data Science Stack Exchange
I am trying to convert the below mentioned sample xml file to a pandas dataframe. I have multiple xml files which I will loop over to add all xml data into a single dataframe once i succeed with this More on datascience.stackexchange.com
🌐 datascience.stackexchange.com
August 23, 2022
python - Parsing nested children nodes using pandas.read_xml - Stack Overflow
I would like to import an xml with nested structure into a pandas dataframe. I include a sample xml More on stackoverflow.com
🌐 stackoverflow.com
BUG: iterparse on read_xml overwrites nested child elements
Pandas version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. I have confirmed this bug exists on the main br... More on github.com
🌐 github.com
3
February 5, 2023
Pandas dataframe to nested xml
xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items) More on reddit.com
🌐 r/learnpython
3
1
January 3, 2021
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.4 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.4.4 documentation
pandas.read_xml(path_or_buffer, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, encoding='utf-8', parser='lxml', stylesheet=None, compression='infer', storage_options=None)[source]¶
🌐
pandas
pandas.pydata.org › pandas-docs › dev › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.0rc1+103.gaf9e3f0ca6 documentation
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? - GeeksforGeeks
July 23, 2025 - Parse or read the XML file using ElementTree.parse( ) function and get the root element. Iterate through the root node to get the child nodes attributes 'SL NO' (here) and extract the text values of each attribute (here foodItem, price, quantity, ...
🌐
PyPI
pypi.org › project › pandas-read-xml
pandas-read-xml
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
Stack Exchange
datascience.stackexchange.com › questions › 113782 › nested-xml-to-dataframe
nested xml to dataframe - Data Science Stack Exchange
August 23, 2022 - import pandas as pd import xml.etree.ElementTree as ETree xml_data = open('path/xml_file.xml', 'r').read() def xml2df(xml_data): root = ETree.XML(xml_data) all_records = [] for i, child in enumerate(root): record = {} for subchild in child: record[subchild.tag] = subchild.text all_records.append(record) return pd.DataFrame(all_records) df = xml2df(xml_data) df.shape df.head()
Find elsewhere
🌐
Medium
medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7
Understanding pandas XML and Its Structure | by why amit | Medium
April 12, 2025 - How does pandas handle nested XML data? Pandas can read nested XML data, but you may need to preprocess it into a flat structure before importing it into a DataFrame.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - Like we've done before, we read the XML contents into a variable. We give this data in the parse() method which returns a dictionary of the XML data. It will be a nested dictionary that has elements and sub-elements of the XML file. We can loop through the elements and write them into a data ...
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 2.0 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.0.3 documentation
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=_NoDefault.no_default)[source]#
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#
🌐
TutorialsPoint
tutorialspoint.com › python_pandas › python_pandas_parsing_xml_file.htm
Python Pandas - Parsing XML File
This example shows how to parse a nested XML structure representing a bookstore. Each <book> node has child elements like title, author, year, and price. By using the xpath parameter we can easily locate and extract these <book> nodes and their contents into a DataFrame. import pandas as pd from io import StringIO # Create a String representing XML data xml = """<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="cooking"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="children"> <title lang="en">Harry Potter</title> <author>J K.
🌐
Pandas
pandas.pydata.org › docs › dev › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.0.dev0+2687.g00a7c41157 documentation
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None, dtype_backend=<no_default>)[source]#
🌐
GitHub
github.com › pandas-dev › pandas › issues › 51183
BUG: iterparse on read_xml overwrites nested child elements · Issue #51183 · pandas-dev/pandas
February 5, 2023 - import pandas as pd XML =''' <values> <guidedSaleKey> <code>9023000918982</code> <externalReference>0102350511</externalReference> </guidedSaleKey> <store> <code>02300</code> <externalReference>1543</externalReference> <currency>EUR</currency> </store> </values> ''' df = pd.read_xml(XML,iterparse={"values":["code","code"]}, names=["guided_code","store_code"]) print(df)
Author   bama-chi
🌐
Reddit
reddit.com › r/learnpython › pandas dataframe to nested xml
r/learnpython on Reddit: Pandas dataframe to nested xml
January 3, 2021 -

Each week I get a spreadsheet of price changes from a supplier. I have been using excel to format and calculate the required columns, then export as xml to import into our stock management system.

I have written a script using pandas to import and process the sheet, but I am stuck on how to export it to xml.

The xml needs to follow the following format:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Items xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Item>
		<Descriptors>
			<Barcode>9770307017919</Barcode>
			<SupplierCode>030701791</SupplierCode>
			<Description>Daily Express (Mon)</Description>
			<CommodityGroup>1</CommodityGroup>
		</Descriptors>
		<Pricing>
			<PackCost>0.5625</PackCost>
			<CostPricePerUnit>0.5625</CostPricePerUnit>
			<RetailPrice>0.75</RetailPrice>
			<ValidFrom>44193</ValidFrom>
		</Pricing>
		<Sizing>
			<PackSize>1</PackSize>
		</Sizing>
		<Flags/>
	</Item>
</Items>

I have the columns of my dataframe titled as Parent.Field i.e:

["Descriptors.Barcode", "Descriptors.SupplierCode", "Descriptors.Description", "Descriptors.CommodityGroup", "Pricing.PackCost", "Pricing.CostPricePerUnit", "Pricing.RetailPrice" "Sizing.Packsize"]

Pretty much the only relevant thing I could find online was this,

https://stackoverflow.com/questions/18574108/how-do-convert-a-pandas-dataframe-to-xml

but i'm unsure how best to utilise this to export with the necessary nested data structure.

Does anyone have any tips as to how I can achieve this?

Top answer
1 of 1
2
xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items)
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.5 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.5.2 documentation
pandas.read_xml(path_or_buffer, *, xpath='./*', namespaces=None, elems_only=False, attrs_only=False, names=None, dtype=None, converters=None, parse_dates=None, encoding='utf-8', parser='lxml', stylesheet=None, iterparse=None, compression='infer', storage_options=None)[source]#
🌐
Plus2Net
plus2net.com › python › pandas-read_xml.php
read_xml() Function: Read Data from XML Files into Pandas DataFrame
` tag, `read_xml()` can parse these ... 75.0 NaN 1 2 Max Ruin 85.0 80.0 2 3 Arnold 55.0 90.0 --- What is the purpose of the read_xml() function in Pandas?...
🌐
GitHub
github.com › minchulkim87 › pandas_read_xml
GitHub - minchulkim87/pandas_read_xml
January 25, 2024 - *Sometimes, the XML structure is such that pandas will treat rows vs columns in a way that we think are opposites. For these cases, the read_xml may fail. Try using transpose=True as an argument in such cases. This argument will only affect the reading if root_is_rows=False is passed. The real cumbersome part of working with XML data (or JSON data) is that they do not represent a single table. Rather, they are a (nested...
Starred by 31 users
Forked by 4 users
Languages   Jupyter Notebook 58.8% | Python 41.2% | Jupyter Notebook 58.8% | Python 41.2%