You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas documentation - PyData |
Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI. There is no need to include all namespaces in XML, only the ones used in xpath expression.
Discussions

Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
🌐 r/learnpython
8
3
December 9, 2022
python - How to read XML file into Pandas Dataframe - Stack Overflow
I have a xml file: 'product.xml' that I want to read using pandas, here is an example of the sample file: 32... More on stackoverflow.com
🌐 stackoverflow.com
ElementTree and deeply nested XML
Subreddit for posting questions and asking for general advice about all topics related to learning python. ... Does anyone know how I can parse the data in the example XML below by grabbing data from the tag to the tag? More on reddit.com
🌐 r/learnpython
1
3
July 28, 2016
Trouble parsing huge XML file
I used a SAX parser to parse that document into a database for a university project few years ago. I don't know if it will fit into a dict or DataFrame, though. More on reddit.com
🌐 r/Python
9
2
September 28, 2015
🌐
PyPI
pypi.org › project › pandas-read-xml
pandas-read-xml
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.4 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.4.4 documentation
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - Like we've done before, we read the XML contents into a variable. We give this data in the parse() method which returns a dictionary of the XML data. It will be a nested dictionary that has elements and sub-elements of the XML file. We can loop through the elements and write them into a data ...
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - Code to Generate Big Data — https://medium.com/@sounder.rahul/python-faker-to-generate-data-for-marketing-domain-complex-xml-file-163a0649db4e · import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transac
Find elsewhere
🌐
Medium
medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7
Understanding pandas XML and Its Structure | by why amit | Medium
April 12, 2025 - How do I read XML files using pandas? You can read XML files using the pd.read_xml() function in pandas.
🌐
TutorialsPoint
tutorialspoint.com › python_pandas › python_pandas_read_xml_method.htm
Pandas DataFrame read_xml() Method
January 2, 2025 - The Python Pandas library provides the read_xml() method to read data from an XML document and convert it into a Pandas DataFrame object. This method is a powerful tool for handling structured XML data in tabular form, enabling users to process and
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 2.0 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.0.3 documentation
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - We can try to convert this code ... as et def parse_XML(xml_file, df_cols): """Parse the input XML file and store the result in a pandas DataFrame with the given columns....
🌐
Like Geeks
likegeeks.com › home › python › pandas › parsing xml files into dataframes using pandas read_xml
Parsing XML Files into DataFrames using Pandas read_xml
October 16, 2023 - dtype_backend: Backend to use for dtype inference (“python” or “lxml”). Whether you have an XML file on your local disk, a URL that returns XML data, or a file-like object, read_xml can read XML from these sources.
🌐
YouTube
youtube.com › watch
How to convert an XML file to python pandas dataframe - reading xml with python - YouTube
Work with pandas and python is not so hard, but working with xml can somtimes be hard, in this video tutorial i will show you how you can read xml file into ...
Published   April 20, 2019
🌐
DataScientYst
datascientyst.com › read-xml-file-python-pandas
How to Read XML File with Python and Pandas
October 13, 2022 - In this quick tutorial, we'll cover how to read or convert XML file to Pandas DataFrame or Python data structure. Since version 1.3 Pandas offers an elegant solution for reading XML files: pd.read_xml(). The short solutions is: df = pd.read_xml('sitemap.xml') With the single line
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.5 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.5.2 documentation
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
Saturn Cloud
saturncloud.io › blog › how-to-parse-xml-with-python-pandas
How to Parse XML with Python Pandas | Saturn Cloud Blog
January 18, 2024 - Solution: Use namespace-aware parsing libraries or remove namespaces from the XML file if not needed. Parsing XML with Python Pandas opens up opportunities for efficiently handling structured data. Whether you choose the simplicity of xml.etree.ElementTree or the convenience of xmltodict, understanding the methods, best practices, and potential pitfalls is crucial for successful XML parsing.
🌐
Finxter
blog.finxter.com › reading-and-writing-xml-with-pandas
Reading and Writing XML with Pandas – Be on the Right Side of Change
November 11, 2021 - First, we import the Pandas library. Then, we create a Pandas data frame and assign it to the variable “df”. We do this by applying the read_xml() function in which we put in the path of the XML file as a string.