You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.1 documentation - PyData |
Convert a JSON string to pandas object. read_html · Read HTML tables into a list of DataFrame objects. Notes · This method is best designed to import shallow XML documents in following format which is the ideal fit for the two-dimensions of a DataFrame (row by column). <root> <row> <column1>data</column1> <column2>data</column2> <column3>data</column3> ... </row> <row> ... </row> ... </root> As a file format, XML documents can be designed any way including layout of elements and attributes as long as it conforms to W3C specifications.
🌐
PyPI
pypi.org › project › pandas-read-xml
pandas-read-xml
JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
Discussions

python - How to read XML file into Pandas Dataframe - Stack Overflow
I have a xml file: 'product.xml' that I want to read using pandas, here is an example of the sample file: 32... More on stackoverflow.com
🌐 stackoverflow.com
Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
🌐 r/learnpython
8
3
December 9, 2022
ElementTree and deeply nested XML
Subreddit for posting questions and asking for general advice about all topics related to learning python · Does anyone know how I can parse the data in the example XML below by grabbing data from the tag to the tag? I can grab data from one or the other but not both ... More on reddit.com
🌐 r/learnpython
1
3
July 28, 2016
Trouble parsing huge XML file
I used a SAX parser to parse that document into a database for a university project few years ago. I don't know if it will fit into a dict or DataFrame, though. More on reddit.com
🌐 r/Python
9
2
September 22, 2015
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.4 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.4.4 documentation
Convert a JSON string to pandas object. read_html · Read HTML tables into a list of DataFrame objects. Notes · This method is best designed to import shallow XML documents in following format which is the ideal fit for the two-dimensions of a DataFrame (row by column). <root> <row> <column1>data</column1> <column2>data</column2> <column3>data</column3> ... </row> <row> ... </row> ... </root> As a file format, XML documents can be designed any way including layout of elements and attributes as long as it conforms to W3C specifications.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - Like we've done before, we read the XML contents into a variable. We give this data in the parse() method which returns a dictionary of the XML data. It will be a nested dictionary that has elements and sub-elements of the XML file. We can loop through the elements and write them into a data ...
🌐
Like Geeks
likegeeks.com › home › python › pandas › parsing xml files into dataframes using pandas read_xml
Parsing XML Files into DataFrames using Pandas read_xml
October 16, 2023 - dtype_backend: Backend to use for dtype inference (“python” or “lxml”). Whether you have an XML file on your local disk, a URL that returns XML data, or a file-like object, read_xml can read XML from these sources.
Find elsewhere
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 2.0 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.0.3 documentation
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
pandas
pandas.pydata.org › pandas-docs › dev › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.0rc1+103.gaf9e3f0ca6 documentation
Convert a JSON string to pandas object. read_html · Read HTML tables into a list of DataFrame objects. Notes · This method is best designed to import shallow XML documents in following format which is the ideal fit for the two-dimensions of a DataFrame (row by column). <root> <row> <column1>data</column1> <column2>data</column2> <column3>data</column3> ... </row> <row> ... </row> ... </root> As a file format, XML documents can be designed any way including layout of elements and attributes as long as it conforms to W3C specifications.
🌐
DataScientYst
datascientyst.com › read-xml-file-python-pandas
How to Read XML File with Python and Pandas
October 13, 2022 - In this quick tutorial, we'll cover how to read or convert XML file to Pandas DataFrame or Python data structure. Since version 1.3 Pandas offers an elegant solution for reading XML files: pd.read_xml(). The short solutions is: df = pd.read_xml('sitemap.xml') With the single line
🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - Code to Generate Big Data — https://medium.com/@sounder.rahul/python-faker-to-generate-data-for-marketing-domain-complex-xml-file-163a0649db4e · import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transac
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.5 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.5.2 documentation
Encoding of XML document. parser{‘lxml’,’etree’}, default ‘lxml’ · Parser module to use for retrieval of data. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’ more complex XPath searches and ability to use XSLT stylesheet are supported. ... A URL, file-like object, or a raw string containing an XSLT script.
🌐
TutorialsPoint
tutorialspoint.com › python_pandas › python_pandas_read_xml_method.htm
Pandas DataFrame read_xml() Method
January 2, 2025 - The Python Pandas library provides the read_xml() method to read data from an XML document and convert it into a Pandas DataFrame object. This method is a powerful tool for handling structured XML data in tabular form, enabling users to process and
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
GitHub
github.com › minchulkim87 › pandas_read_xml
GitHub - minchulkim87/pandas_read_xml
January 25, 2024 - ... You will need to identify the ... 'second-tag', 'the-tag-you-want-as-root']) By default, pandas-read-xml will treat the root tag as being the "rows" of the pandas dataframe....
Starred by 31 users
Forked by 4 users
Languages   Jupyter Notebook 58.8% | Python 41.2% | Jupyter Notebook 58.8% | Python 41.2%
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - We can try to convert this code ... as et def parse_XML(xml_file, df_cols): """Parse the input XML file and store the result in a pandas DataFrame with the given columns....
🌐
Pandas How To
pandashowto.com › pandas how to › data input and output › how to read and process xml data in pandas
How to Read and Process XML Data in Pandas – Pandas How To
3 weeks ago - Not all XML files are flat or uniform. Nested elements or attributes may require additional preprocessing. In such cases, combining Pandas with Python’s xml.etree.ElementTree library can help flatten the data before loading it into a DataFrame:
🌐
Medium
medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7
Understanding pandas XML and Its Structure | by why amit | Medium
April 12, 2025 - How does pandas handle nested XML data? Pandas can read nested XML data, but you may need to preprocess it into a flat structure before importing it into a DataFrame. In conclusion, working with pandas xml is a powerful skill that bridges the ...