You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Saturn Cloud
saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide
Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog
November 15, 2023 - And there you have it! Your XML data is now in a Python DataFrame, ready for analysis. Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.
Discussions

Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
🌐 r/learnpython
8
3
December 9, 2022
Pandas XML?
Hello again. It may be simpler to skip the pandas xml methods and use something else. Even reading this data is a problem: >>> pd.read_xml("merge.xml.1678125433.bak").T 0 1 Company x None SenderName y None SenderPhone z None TransferDate a None BrandAAIAID b None DocumentTitle c None DocFormNumber 2.0 NaN EffectiveDate 2023-02-22 None SubmissionType FULL None MapperCompany d None MapperContact e None MapperPhone f None MapperEmail g None VcdbVersionDate 2023-01-26 None QdbVersionDate 2023-01-26 None PcdbVersionDate 2023-01-26 None action None A id NaN 1.0 BaseVehicle NaN NaN BodyType NaN NaN EngineBase NaN NaN Note None WITHOUT AUTO LEVELING SYSTEM Qty NaN 1.0 PartType NaN NaN Position NaN NaN Part NaN 701940.0 See how there is only a single id - all of the other id values in your data are lost. It looks like xmltodict may be the best choice here. import xmltodict from pathlib import Path xmldict = xmltodict.parse(Path("merge.xml").read_bytes()) df = pd.json_normalize(xmldict).explode("ACES.App") df["Aces.App.Part"] = pd.json_normalize(df["ACES.App"]).set_index(df.index)["Part"] I'm guessing this is the Part ID used in your unique checks. This will give you: ACES.@version 4.2 4.2 ACES.Header.Company x x ACES.Header.SenderName y y ACES.Header.SenderPhone z z ACES.Header.TransferDate a a ACES.Header.BrandAAIAID b b ACES.Header.DocumentTitle c c ACES.Header.DocFormNumber 2.0 2.0 ACES.Header.EffectiveDate 2023-02-22 2023-02-22 ACES.Header.SubmissionType FULL FULL ACES.Header.MapperCompany d d ACES.Header.MapperContact e e ACES.Header.MapperPhone f f ACES.Header.MapperEmail g g ACES.Header.VcdbVersionDate 2023-01-26 2023-01-26 ACES.Header.QdbVersionDate 2023-01-26 2023-01-26 ACES.Header.PcdbVersionDate 2023-01-26 2023-01-26 ACES.App {'@action': 'A', '@id': '1', 'BaseVehicle': {'... {'@action': 'B', '@id': '2', 'BaseVehicle': {'... Aces.App.Part 701940 12345 To go back to XML - you "reverse" the "json_normalize" def df_to_xmltodict(df, sep="."): result = {"ACES": {"Header": {}, "App": []}} for _, row in df.iterrows(): new_row = {} for name, value in row.items(): keys = name.split(sep) count = len(keys) - 1 parent = new_row for index, key in enumerate(keys): if index == count: parent[key] = value else: parent.setdefault(key, {}) parent = parent[key] result["ACES"]["App"].append(new_row["ACES"]["App"]) del new_row["ACES"]["App"] del new_row["Aces"] for key, value in new_row.items(): result["ACES"].update(value) return result You can then give this to xmltodict.unparse to go back to XML: >>> print(xmltodict.unparse(df_to_xmltodict(df), pretty=True)) x y z a b c 2.0 2023-02-22 FULL d e f g 2023-01-26 2023-01-26 2023-01-26 WITHOUT AUTO LEVELING SYSTEM 1 701940 YOLO 1 12345 More on reddit.com
🌐 r/learnpython
8
1
March 6, 2023
People also ask

How to use the Convert XML to Pandas DataFrame Online for free?
Upload your XML file, paste data, or extract from web pages using our free online table converter. Convert XML to PandasDataFrame instantly with real-time preview and advanced editing. This XML to PandasDataFrame converter lets you copy or download your PandasDataFrame output right away.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
What is Pandas DataFrame format?
Pandas is the most popular data analysis library in Python, with DataFrame being its core data structure. It provides powerful data manipulation, cleaning, and analysis capabilities, widely used in data science, machine learning, and business intelligence. An indispensable tool for Python developers and data analysts.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
What is XML format?
XML (eXtensible Markup Language) is the standard format for enterprise-level data exchange and configuration management, with strict syntax specifications and powerful validation mechanisms. Widely used in web services, configuration files, document storage, and system integration. Supports namespaces, schema validation, and XSLT transformation, making it important table data for enterprise applications.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("students.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in xroot: s_name = node.attrib.get("name") s_mail = node.find("email").text if node is not None else None s_grade = node.find("grade").text if node is not None else None s_age = node.find("age").text if node is not None else None rows.append({"name": s_name, "email": s_mail, "grade": s_grade, "age": s_age}) out_df = pd.DataFrame(rows, columns = df_cols) The downside to this approach is that you need to know the structure of the XML file in advance, and you have to hard-code column names accordingly. We can try to convert this code to a more useful and versatile function, without having to hard-code any values:
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python
Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks
March 21, 2024 - Now we have extracted the data from the XML file using the BeautifulSoup into the DataFrame and it is stored as ‘df’. To see the DataFrame we use the print statement to print it. ... # Python program to convert xml # structure into dataframes using beautifulsoup # Import libraries from bs4 import BeautifulSoup import pandas as pd # Open XML file file = open("gfg.xml", 'r') # Read the contents of that file contents = file.read() soup = BeautifulSoup(contents, 'xml') # Extracting the data authors = soup.find_all('author') titles = soup.find_all('title') prices = soup.find_all('price') pubdat
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.
Find elsewhere
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.1 documentation - PyData |
The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.
🌐
YouTube
youtube.com › watch
Convert XML to DataFrame in Python using pandas - Part #2 - YouTube
This demo explains everything you need to successfully apply the steps in your projectsetup on windows:python -m pip install -U pip setuptoolspip3 install ju...
Published   December 4, 2019
🌐
Delft Stack
delftstack.com › home › howto › python pandas › convert xml file to python nice pandas dataframe
How to Convert XML File to Pandas DataFrame | Delft Stack
October 16, 2023 - This tutorial introduces how an XML file is converted into a Python Pandas nice dataframe. The library used for this is the xml.etree.ElementTree.
🌐
Table Convert
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
March 21, 2024 - Generate standard Pandas DataFrame code with support for data type specifications, index settings, and data operations. Generated code can be directly executed in Python environment for data analysis and processing. Extract tables from any website with one click. Convert to 30+ formats including Excel, CSV, JSON instantly - no copy-pasting required. Converting XML ...
🌐
Kaggle
kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe
Conversion of the XML file to a Pandas Dataframe
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Educative
educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup
How to convert XML to a DataFrame using BeautifulSoup
October 10, 2023 - Step 5: We get the text data from XML. ... Step 6: We create and print the DataFrame.
🌐
YouTube
youtube.com › watch
Convert XML to Pandas DataFrame in Python - YouTube
Learn how to convert XML data to a Pandas DataFrame in Python with this easy-to-follow tutorial. Start optimizing your data analysis process today!----------...
Published   October 8, 2025
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? | GeeksforGeeks
May 22, 2017 - In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - from lxml import objectify import ... 0 1.0 7020000.0 35237.0 1 3.0 10000000.0 32238.0 2 nan 4128000.0 44699.0 · The xmltodict module converts the XML data into a Python dictionary as the name suggests....
🌐
Stack Overflow
stackoverflow.com › questions › 54566730 › convert-xml-data-into-dataframe
python - convert xml data into dataframe - Stack Overflow
import xml.etree.ElementTree as ET import pandas as pd def getMetrics(file_name): tree = ET.parse(file_name) root = tree.getroot() result = [] for setnode in root.iter('Set'): node = setnode.attrib["Parameter"] for ifnode in setnode: if "Parameter" in ifnode.attrib: result.append(dict(node=node, parameter=ifnode.attrib.get("Parameter"))) return result df = pd.DataFrame(getMetrics('sample.xml')) print(df) ... <?xml version="1.0" encoding="UTF-8"?> <Rules> <Set Parameter="4" To="90"> <If Parameter="1087" EqualsTo="90" /> </Set> <Set Parameter="5" To="-5"> <If Parameter="1087" EqualsTo="87" /> </
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
December 21, 2022 - Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.
🌐
GitHub
gist.github.com › mattmc3 › 712f280ec81044ec7bd12a6dda560787
Python: Import XML to Pandas dataframe, and then dataframe to Sqlite database · GitHub
April 28, 2021 - Python: Import XML to Pandas dataframe, and then dataframe to Sqlite database - import_xml_to_dataframe_to_sql.py
🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
January 11, 2019 - Code to Generate Big Data — https://medium.com/@sounder.rahul/python-faker-to-generate-data-for-marketing-domain-complex-xml-file-163a0649db4e · import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transac