You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.1 documentation
Read XML document into a DataFrame object. ... String path, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file.
Discussions

nested xml to dataframe - Data Science Stack Exchange
I am trying to convert the below mentioned sample xml file to a pandas dataframe. I have multiple xml files which I will loop over to add all xml data into a single dataframe once i succeed with this More on datascience.stackexchange.com
🌐 datascience.stackexchange.com
August 23, 2022
Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
🌐 r/learnpython
8
3
December 9, 2022
Pandas dataframe to nested xml
xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items) More on reddit.com
🌐 r/learnpython
3
1
January 3, 2021
ElementTree and deeply nested XML
Subreddit for posting questions and asking for general advice about all topics related to learning python. ... Does anyone know how I can parse the data in the example XML below by grabbing data from the tag to the tag? More on reddit.com
🌐 r/learnpython
1
3
July 28, 2016
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - We can try to convert this code to a more useful and versatile function, without having to hard-code any values: import pandas as pd import xml.etree.ElementTree as et def parse_XML(xml_file, df_cols): """Parse the input XML file and store the result in a pandas DataFrame with the given columns.
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python
Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks
March 21, 2024 - Now we have extracted the data from the XML file using the BeautifulSoup into the DataFrame and it is stored as ‘df’. To see the DataFrame we use the print statement to print it. ... # Python program to convert xml # structure into dataframes using beautifulsoup # Import libraries from bs4 import BeautifulSoup import pandas as pd # Open XML file file = open("gfg.xml", 'r') # Read the contents of that file contents = file.read() soup = BeautifulSoup(contents, 'xml') # Extracting the data authors = soup.find_all('author') titles = soup.find_all('title') prices = soup.find_all('price') pubdat
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - We can directly use objectify.parse() and give it the path to the XML file. To get the root element, we will use getroot() on the parsed XML data. Now we can loop through the children elements of the root node and write them into a Python list. Like before, we'll create a DataFrame using the ...
Find elsewhere
🌐
Stack Exchange
datascience.stackexchange.com › questions › 113782 › nested-xml-to-dataframe
nested xml to dataframe - Data Science Stack Exchange
August 23, 2022 - filenames = glob.glob("/annotations/*.xml") filenames = [item.replace("\\", "/") for item in filenames] class XML2DataFrame: def __init__(self, filename): self.root = ET.parse(filename).getroot() def parse_root(self, root): return [self.parse_element(child) for child in iter(root)] def parse_element(self, element, parsed=None): if parsed is None: parsed = dict() for key in element.keys(): parsed[key] = element.attrib.get(key) if element.text: parsed[element.tag] = element.text for child in list(element): self.parse_element(child, parsed) return parsed def new_format(file): return "'{}'".format
🌐
Saturn Cloud
saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide
Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog
November 15, 2023 - And there you have it! Your XML data is now in a Python DataFrame, ready for analysis. Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.
🌐
Python Forum
python-forum.io › thread-22948.html
Parse XML String in Pandas Dataframe
December 4, 2019 - Here is my situation: I have a pandas dataframe that contains one column with an xml string for each row. I need to be able to parse the xml string for each row to see the data elements of the xml file. All the code I have been able to find is code ...
🌐
PyPI
pypi.org › project › xml-to-df
xml-to-df · PyPI
Package to convert xml to Pandas dataframe (flattens each and every xml element to dataframe column)
      » pip install xml-to-df
    
Published   Jan 06, 2021
Version   0.0.6
🌐
Kaggle
kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe
Conversion of the XML file to a Pandas Dataframe | Kaggle
April 15, 2020 - Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
🌐
GitHub
github.com › selimonat › xml2dataframe
GitHub - selimonat/xml2dataframe: Reads xml data into a python DataFrame.
Reads xml data into a python DataFrame. Converts an XML file to Pandas Data Frame.
Author   selimonat
🌐
YouTube
youtube.com › watch
Transforming Nested XML to Pandas DataFrame - YouTube
Hello and welcome to this tutorial. In this tutorial, you will learn how to transform XML documents to pandas data frames using Python and the element tree l...
Published   October 21, 2023
🌐
Educative
educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup
How to convert XML to a DataFrame using BeautifulSoup
Here, we pass data and the data file format xml to the BeautifulSoup function. Step 4: We search the data. ... Step 5: We get the text data from XML. ... Step 6: We create and print the DataFrame.
🌐
PyPI
pypi.org › project › pandas-read-xml
pandas-read-xml
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? - GeeksforGeeks
July 23, 2025 - import xml.etree.ElementTree as ETree import pandas as pd # give the path where you saved the xml file # inside the quotes xmldata = "C: \\ProgramData\\Microsoft\\ Windows\\Start Menu\\Programs\\ Anaconda3(64-bit)\\xmltopandas.xml" prstree = ETree.parse(xmldata) root = prstree.getroot() # print(root) store_items = [] all_items = [] for storeno in root.iter('store'): store_Nr = storeno.attrib.get('slNo') itemsF = storeno.find('foodItem').text price = storeno.find('price').text quan = storeno.find('quantity').text dis = storeno.find('discount').text store_items = [store_Nr, itemsF, price, quan, dis] all_items.append(store_items) xmlToDf = pd.DataFrame(all_items, columns=[ 'SL No', 'ITEM_NUMBER', 'PRICE', 'QUANTITY', 'DISCOUNT']) print(xmlToDf.to_string(index=False)) ... Note: The XML file should be saved in the same directory or folder where your Python code saved.
🌐
Medium
medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7
Understanding pandas XML and Its Structure | by why amit | Medium
April 12, 2025 - This code snippet does just that. With pd.read_xml(), pandas makes it straightforward to import XML data, transforming it into a DataFrame that you can manipulate just like any other dataset.
🌐
Plain English
python.plainenglish.io › saving-xml-content-to-a-pandas-dataframe-using-xmltodict-b6fab32a5100
Saving XML content to a pandas DataFrame using xmltodict | by Liam Connors | Python in Plain English
June 7, 2021 - I am sure there are many ways to get data from XML into a DataFrame, but I found xmltodict offered a straightforward way for what I wanted to achieve. Also, working with xmltodict to convert to dictionaries is something I found very helpful for understanding more about the process of extracting specific data from lists of dictionaries. ... New Python ...
🌐
AskPython
askpython.com › home › pandas dataframe.to_xml – render a dataframe to an xml document
Pandas DataFrame.to_xml - Render a DataFrame to an XML Document - AskPython
February 15, 2023 - Install lxml provides safe and convenient access to these libraries using the ElementTree API. It extends the ElementTree API significantly to support XPath, RelaxNG, XML Schema, XSLT, C14N, and more. ... After importing the Pandas module, let’s write the simple code to render the DataFrame to an XML file.