You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("students.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in xroot: s_name = node.attrib.get("name") s_mail = node.find("email").text if node is not None else None s_grade = node.find("grade").text if node is not None else None s_age = node.find("age").text if node is not None else None rows.append({"name": s_name, "email": s_mail, "grade": s_grade, "age": s_age}) out_df = pd.DataFrame(rows, columns = df_cols) The downside to this approach is that you need to know the structure
Discussions

Parsing XML into a Pandas dataframe
To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com
🌐 r/learnpython
8
3
December 9, 2022
Making XML to Dataframe for Pandas Python - Stack Overflow
I want to convert a XML-file to a Dataframe so I can do Data Analysis trough Pandas, in my Python notebook. However all the solutions I find on this website give me errors. The following is a short More on stackoverflow.com
🌐 stackoverflow.com
May 29, 2017
Pandas dataframe to nested xml
xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items) More on reddit.com
🌐 r/learnpython
3
1
January 3, 2021
XML to Dataframe/database
You want a dataframe from each type? You can pass xpath= to select certain nodes. race_df = pandas.read_xml(xml, xpath='//race') nomination_df = pandas.read_xml(xml, xpath='//nomination') ... More on reddit.com
🌐 r/learnpython
2
2
January 27, 2021
People also ask

What is XML format?
XML (eXtensible Markup Language) is the standard format for enterprise-level data exchange and configuration management, with strict syntax specifications and powerful validation mechanisms. Widely used in web services, configuration files, document storage, and system integration. Supports namespaces, schema validation, and XSLT transformation, making it important table data for enterprise applications.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
Is my data secure when using this online converter?
Absolutely! All table conversions happen locally in your browser - your data never leaves your device. Our online converter processes everything client-side, ensuring complete privacy and data security. No files are stored on our servers.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
Is TableConvert really free to use?
Yes, TableConvert is completely free! All converter features, table editor, data generator tools, and export options are available without cost, registration, or hidden fees. Convert unlimited files online for free.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 3.0.1 documentation - PyData |
The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.
🌐
Table Convert
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
January 11, 2019 - Convert XML to PandasDataFrame online with our free online table converter. XML to PandasDataFrame converter: convert XML to PandasDataFrame in seconds — paste, edit, and download PandasDataFrame. Need to convert XML to PandasDataFrame for an API, spreadsheet, or documentation?
🌐
YouTube
youtube.com › watch
How to convert an XML file to python pandas dataframe - reading xml with python - YouTube
Work with pandas and python is not so hard, but working with xml can somtimes be hard, in this video tutorial i will show you how you can read xml file into ...
Published   November 17, 2024
🌐
Saturn Cloud
saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide
Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog
November 15, 2023 - Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.
Find elsewhere
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - By understanding these common errors and implementing appropriate safeguards, you can enhance the robustness and reliability of your XML to DataFrame conversion process. With this script, you can easily convert any complex XML file into a Pandas DataFrame or CSV file.
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
Oracle
blog.toadworld.com › home › python for data science – importing xml to pandas dataframe
Python for Data Science – Importing XML to Pandas DataFrame - The Quest Blog
January 2, 2025 - import pandas as p import xml.etree.cElementTree as et parsedXML = et.parse( "test.xml" ) dfcols = ['name','email','phone','street'] df = pd.DataFrame(columns=dfcols) def getvalueofnode( node ): return node.text if node is not None else None for node in parsedXML.getroot(): name = node.attrib.get('name') email = node.find('email') phone = node.find('phone') street = node.find('address/street') df = df.append( pd.Series( [name, getvalueofnode(email), getvalueofnode(phone), getvalueofnode(street)], index=dfcols) ,ignore_index=True) print df Output: name email phone street 0 gokhan gokhan@gmail.com 555-1234 None 1 mike mike@gmail.com None None 2 john john@gmail.com 555-4567 None 3 david None 555-6472 Fifth Avenue
🌐
Kaggle
kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe
Conversion of the XML file to a Pandas Dataframe
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? | GeeksforGeeks
April 28, 2021 - List to Dataframe Example# Simple listdata = [1, 2, 3, 4, 5]# Convert to DataFramedf = pd.DataFrame(data, columns=['Numbers'])Here we will discuss diffe ... Python lists are one of the most versatile data structures, offering a range of built-in functions for efficient data manipulation. When working with Pandas, we often need to extract entire rows from a DataFrame and store them in a list for further processing.
🌐
Stack Overflow
stackoverflow.com › questions › 44231166 › making-xml-to-dataframe-for-pandas-python
Making XML to Dataframe for Pandas Python - Stack Overflow
May 29, 2017 - import xml.etree.ElementTree as ET from lxml import etree import pandas as pd xml_data = 'Kandidatenlijsten_TK2017_Amsterdam.eml' def xml2df(xml_data): tree = ET.parse(xml_data) root = tree.getroot() all_records = [] headers = [] for i, child in enumerate(root): record = [] for subchild in child: record.append(subchild.text) if subchild.tag not in headers: headers.append(subchild.tag) all_records.append(record) return pd.DataFrame(all_records, columns=headers)`
🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transaction.find("amount").text), "account": transaction.find("account").text } data.append(record) # Create a Pandas DataFrame from the list of dictionari
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.
🌐
Like Geeks
likegeeks.com › home › python › pandas › parsing xml files into dataframes using pandas read_xml
Parsing XML Files into DataFrames using Pandas read_xml
May 13, 2020 - Here’s how to use read them: from io import StringIO xml_data = """ <data> <row> <name>John</name> <age>28</age> </row> <row> <name>Jane</name> <age>24</age> </row> </data> """ data_io = StringIO(xml_data) df = pd.read_xml(data_io) print(df) ... In this example, we simulate a file-like object using StringIO and then read the XML data from it into a DataFrame. Pandas’ read_xml provides the flexibility to choose among different XML parsers.
🌐
Blogger
timhomelab.blogspot.com › 2014 › 01 › how-to-read-xml-file-into-dataframe.html
lab notebook: How to read XML file into pandas dataframe using lxml
September 19, 2023 - root = xml.getroot() Now we can access child nodes, and with · root.getchildren()[0].getchildren() we're able to get the actual content of the first child node as a simple Python list: [1, 'First'] Now we obviously want to convert this data into data frame. Les's import pandas: import pandas as pd Prepare a empty data frame that will hold our data: df = pd.DataFrame(columns=('id', 'name')) Now we go though our XML file appending data to this dataframe: for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s) (name of the Series object serves as an index element while appending the object to DataFrame) And here is out fresh dataframe: id name 0 1 First 1 2 Second 2 3 Third 3 4 Fourth ·
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 1.4 › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 1.4.4 documentation
String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file. ... The XPath to parse required set of nodes for migration to DataFrame.
🌐
Educative
educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup
How to convert XML to a DataFrame using BeautifulSoup
In the code snippet given above, we import the BeautifulSoup and Pandas libraries. Step 2: We read the XML file.
🌐
PyPI
pypi.org › project › xml-to-df
xml-to-df
April 4, 2023 - JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser