You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas documentation - PyData |
String path, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.
Discussions

Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
๐ŸŒ r/learnpython
8
3
December 9, 2022
python - How to read XML file into Pandas Dataframe - Stack Overflow
I have a xml file: 'product.xml' that I want to read using pandas, here is an example of the sample file: 32... More on stackoverflow.com
๐ŸŒ stackoverflow.com
elementtree - Read XML file to Pandas DataFrame - Stack Overflow
Can someone please help convert the following XML file to Pandas dataframe: More on stackoverflow.com
๐ŸŒ stackoverflow.com
October 24, 2018
How to load 85.6 GB of XML data into a dataframe
That's very large for a single dataset. I doubt your PC has that much RAM - most of us have more like 16GB or 32GB of RAM, so even if you were able to load the data it'd exceed your RAM capacity and your program would be extremely slow as it constantly swaps to disk. Also, there's definitely no way this is going to work if you're not running a 64-bit version of Python. If you're running a 32-bit version of Python (still pretty common) then you can't load anything larger than around 2 GB. But even if you are running 64-bit Python, that's unreasonably large to try to read into RAM. The typical solution in these cases is to preprocess your data and split it into smaller files, each one of which being a reasonable size. Machine learning packages like TensorFlow are designed to work this way - rather than loading all of your training data into RAM at once, you load a chunk at a time and train on that, then you unload that and load more. You might want to first find a tool to split your xml file into smaller chunks. More on reddit.com
๐ŸŒ r/learnprogramming
7
6
September 26, 2021
๐ŸŒ
PyPI
pypi.org โ€บ project โ€บ pandas-read-xml
pandas-read-xml
JavaScript is disabled in your browser. Please enable JavaScript to proceed ยท A required part of this site couldnโ€™t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ version โ€บ 1.4 โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas 1.4.4 documentation
String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.
๐ŸŒ
Medium
medium.com โ€บ @robertopreste โ€บ from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain properโ€ฆ | by Roberto Preste | Medium
August 25, 2019 - Each iteration will return a set of data that can be thought as an observation in a pandas DataFrame; we can build this procedure as follows: import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("students.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in xroot: s_name = node.attrib.get("name") s_mail = node.find("email").text if node is not None else None s_grade = node.find("grade").text if node is not None else None s_age = node.find("age").text if node is not None else None rows.append({"name": s_name, "email": s_mail, "grade": s_grade, "age": s_age}) out_df = pd.DataFrame(rows, columns = df_cols)
๐ŸŒ
Reddit
reddit.com โ€บ r/learnpython โ€บ parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

๐ŸŒ
pandas
pandas.pydata.org โ€บ pandas-docs โ€บ dev โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas 3.0.0rc1+103.gaf9e3f0ca6 documentation
String path, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.
Find elsewhere
๐ŸŒ
Stack Abuse
stackabuse.com โ€บ reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - Like we've done before, we read the XML contents into a variable. We give this data in the parse() method which returns a dictionary of the XML data. It will be a nested dictionary that has elements and sub-elements of the XML file. We can loop through the elements and write them into a data ...
๐ŸŒ
Like Geeks
likegeeks.com โ€บ home โ€บ python โ€บ pandas โ€บ parsing xml files into dataframes using pandas read_xml
Parsing XML Files into DataFrames using Pandas read_xml
October 16, 2023 - Hereโ€™s how to use read them: from io import StringIO xml_data = """ <data> <row> <name>John</name> <age>28</age> </row> <row> <name>Jane</name> <age>24</age> </row> </data> """ data_io = StringIO(xml_data) df = pd.read_xml(data_io) print(df) ... In this example, we simulate a file-like object using StringIO and then read the XML data from it into a DataFrame. Pandasโ€™ read_xml provides the flexibility to choose among different XML parsers.
๐ŸŒ
TutorialsPoint
tutorialspoint.com โ€บ python_pandas โ€บ python_pandas_read_xml_method.htm
Pandas DataFrame read_xml() Method
January 2, 2025 - The Python Pandas library provides the read_xml() method to read data from an XML document and convert it into a Pandas DataFrame object. This method is a powerful tool for handling structured XML data in tabular form, enabling users to process and
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ dev โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas 3.0.0.dev0+2687.g00a7c41157 documentation
String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.
๐ŸŒ
Plus2Net
plus2net.com โ€บ python โ€บ pandas-read_xml.php
read_xml() Function: Read Data from XML Files into Pandas DataFrame
Use the xpath parameter with the namespace prefix (e.g., .//ns:Student) to target specific elements. ... The resulting DataFrame includes attributes (id, name, class) and text content (Passed or Failed). When the XML contains nested tags, such as scores grouped within a ` ` tag, `read_xml()` can parse these nested elements: import pandas as pd df = pd.read_xml('nested_student.xml') print(df) ### Output:
Top answer
1 of 3
7

if the data is simple, like this, then you can do something like:

from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()

bathrooms = [child.text for child in root['bathrooms'].getchildren()]
price = [child.text for child in root['price'].getchildren()]
property_id = [child.text for child in root['property_id'].getchildren()]

data = [bathrooms, price, property_id]
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']

    bathrooms   price      property_id
0   1.0        7020000.0    35237.0
1   3.0        10000000.0   32238.0
2   nan        4128000.0    44699.0

if it is more complex then a loop is better. You can do something like

from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()

data=[]
for i in range(len(root.getchildren())):
    data.append([child.text for child in root.getchildren()[i].getchildren()])

df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
2 of 3
3

Hello all I found another really easily way to solve those question. reference: https://www.youtube.com/watch?v=WVrg5-cjr5k

import xml.etree.ElementTree as ET
import pandas as pd
import codecs

## open notebook and save your xml file to text.xml 
with codecs.open('text.xml', 'r', encoding='utf8') as f:
    tt = f.read()


def xml2df(xml_data):
    root = ET.XML(xml_data)
    all_records = []
    for i, child in enumerate(root):
        record = {}
        for sub_child in child:
            record[sub_child.tag] = sub_child.text
        all_records.append(record)
    return pd.DataFrame(all_records)


df_xml1 = xml2df(tt)
print(df_xml1)

for better understanding of ET you can use underneath code to see what in side of your xml

import xml.etree.ElementTree as ET
import pandas as pd
import codecs
with codecs.open('text.xml', 'r', encoding='utf8') as f:
    tt = f.read()

root = ET.XML(tt)

print(type(root))
print(root[0])
for ele in root[0]:
    print(ele.tag + '////' + ele.text)

print(root[0][0].tag)

Once you finish running the program you can see the output underneath:

C:\Users\username\Documents\pycode\Scripts\python.exe C:/Users/username/PycharmProjects/DestinationLight/try.py
      n35237      n32238     n44699
0        1.0         3.0        nan
1  7020000.0  10000000.0  4128000.0
2    35237.0     32238.0    44699.0

<class 'xml.etree.ElementTree.Element'>
<Element 'bathrooms' at 0x00000285006B6180>
n35237////1.0
n32238////3.0
n44699////nan
n35237

Process finished with exit code 0
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ stable โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas 2.2.2 documentation - PyData |
String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.
๐ŸŒ
Blogger
timhomelab.blogspot.com โ€บ 2014 โ€บ 01 โ€บ how-to-read-xml-file-into-dataframe.html
lab notebook: How to read XML file into pandas dataframe using lxml
January 22, 2014 - for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s) (name of the Series object serves as an index element while appending the object to DataFrame) And here is out fresh dataframe: ... from lxml import objectify import pandas as pd path = 'file_path' xml = objectify.parse(open(path)) root = xml.getroot() root.getchildren()[0].getchildren() df = pd.DataFrame(columns=('id', 'name')) for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s)
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ python โ€บ how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? - GeeksforGeeks
July 23, 2025 - Parse or read the XML file using ElementTree.parse( ) function and get the root element. Iterate through the root node to get the child nodes attributes 'SL NO' (here) and extract the text values of each attribute (here foodItem, price, quantity, ...
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ version โ€บ 2.0 โ€บ reference โ€บ api โ€บ pandas.read_xml.html
pandas.read_xml โ€” pandas 2.0.3 documentation
String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.
๐ŸŒ
Medium
medium.com โ€บ @sounder.rahul โ€บ reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain โ€” Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transaction.find("amount").text), "account": transaction.find("account").text } data.append(record) # Create a Pandas DataFrame from the list of dictionaries df = pd.DataFrame(data) return df # Path to the XML file file_path = "finance_transactions.xml" # Parse XML and create a DataFrame df = parse_xml_to_dataframe(file_path) print("Pandas DataFrame:") print(df)