You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - import pandas as pd import xml.etree.ElementTree as et def parse_XML(xml_file, df_cols): """Parse the input XML file and store the result in a pandas DataFrame with the given columns.
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.
🌐
Saturn Cloud
saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide
Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog
November 15, 2023 - Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.1 documentation - PyData |
Convert a JSON string to pandas object. ... Read HTML tables into a list of DataFrame objects. ... This method is best designed to import shallow XML documents in following format which is the ideal fit for the two-dimensions of a DataFrame (row by column). <root> <row> <column1>data</column1> <column2>data</column2> <column3>data</column3> ... </row> <row> ... </row> ... </root> As a file format, XML documents can be designed any way including layout of elements and attributes as long as it conforms to W3C specifications.
🌐
YouTube
youtube.com › watch
Convert XML to DataFrame in Python using pandas - Part #2 - YouTube
This demo explains everything you need to successfully apply the steps in your projectsetup on windows:python -m pip install -U pip setuptoolspip3 install ju...
Published   December 1, 2021
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python
Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks
March 21, 2024 - Now we have extracted the data from the XML file using the BeautifulSoup into the DataFrame and it is stored as ‘df’. To see the DataFrame we use the print statement to print it. ... # Python program to convert xml # structure into dataframes using beautifulsoup # Import libraries from bs4 import BeautifulSoup import pandas as pd # Open XML file file = open("gfg.xml", 'r') # Read the contents of that file contents = file.read() soup = BeautifulSoup(contents, 'xml') # Extracting the data authors = soup.find_all('author') titles = soup.find_all('title') prices = soup.find_all('price') pubdat
Find elsewhere
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
Educative
educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup
How to convert XML to a DataFrame using BeautifulSoup
Here, we pass data and the data file format xml to the BeautifulSoup function. Step 4: We search the data. ... Step 5: We get the text data from XML. ... Step 6: We create and print the DataFrame.
🌐
Delft Stack
delftstack.com › home › howto › python pandas › convert xml file to python nice pandas dataframe
How to Convert XML File to Pandas DataFrame | Delft Stack
February 2, 2024 - Assume that this data is saved in an XML file called students.xml. import xml.etree.ElementTree as et xtree = et.parse("/content/drive/MyDrive/ABC/student.xml") xroot = xtree.getroot() We can now loop over the tree, grabbing each student element, its name property, and its sub-elements to generate our dataframe.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - from lxml import objectify import ... 0 1.0 7020000.0 35237.0 1 3.0 10000000.0 32238.0 2 nan 4128000.0 44699.0 · The xmltodict module converts the XML data into a Python dictionary as the name suggests....
🌐
Table Convert
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
January 11, 2019 - It provides powerful data manipulation, cleaning, and analysis capabilities, widely used in data science, machine learning, and business intelligence. An indispensable tool for Python developers and data analysts. How to use the Convert XML to Pandas DataFrame Online for free? Upload your XML file, paste data, or extract from web pages using our free online table converter.
🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transaction.find("amount").text), "account": transaction.find("account").text } data.append(record) # Create a Pandas DataFrame from the list of dictionari
🌐
Kaggle
kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe
Conversion of the XML file to a Pandas Dataframe
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? | GeeksforGeeks
April 28, 2021 - In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file.
🌐
Medium
medium.com › @curiouskhanna › how-to-convert-xml-into-data-frame-using-python-507d5b0d1831
How to convert xml into data frame using python? | by Shubham Khanna | Medium
December 21, 2022 - import pandas as pd # Read the ... XML string that you want to convert into a dataframe, you can use the from_string() method of read_xml() to parse the string....
🌐
GitHub
github.com › aadiby › xml2df
GitHub - aadiby/xml2df: Convert XML file to a pandas dataframe. This package flattens the XML structure and creates a list of dictionaries that is then transformed to a dataframe.
October 8, 2025 - This package flattens the XML structure and creates a list of dictionaries that is then transformed to a dataframe. ... The code is available in the xml2df.py Running the file will allow you to process the example.xml file: ... # batched elements is a list containing all the nodes who's children are of the same instance batched_elements = ["publish_date", "author_details"] df_result = xml2df(document, "book", batched_elements) print(df_result.head(5)) The file contains several functions to process your XML file.
Author   aadiby
🌐
Blogger
timhomelab.blogspot.com › 2014 › 01 › how-to-read-xml-file-into-dataframe.html
lab notebook: How to read XML file into pandas dataframe using lxml
March 5, 2021 - path = 'file_path' xml = objectify.parse(open(path)) Get the root node: root = xml.getroot() Now we can access child nodes, and with · root.getchildren()[0].getchildren() we're able to get the actual content of the first child node as a simple Python list: [1, 'First'] Now we obviously want to convert this data into data frame. Les's import pandas: import pandas as pd Prepare a empty data frame that will hold our data: df = pd.DataFrame(columns=('id', 'name')) Now we go though our XML file appending data to this dataframe: for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s) (name of the Series object serves as an index element while appending the object to DataFrame) And here is out fresh dataframe: id name 0 1 First 1 2 Second 2 3 Third 3 4 Fourth ·