You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow
🌐
Medium
medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c
From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium
August 25, 2019 - import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("students.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in xroot: s_name = node.attrib.get("name") s_mail = node.find("email").text if node is not None else None s_grade = node.find("grade").text if node is not None else None s_age = node.find("age").text if node is not None else None rows.append({"name": s_name, "email": s_mail, "grade": s_grade, "age": s_age}) out_df = pd.DataFrame(rows, columns = df_cols) The downside to this approach is that you need to know the structure
People also ask

What is Pandas DataFrame format?
Pandas is the most popular data analysis library in Python, with DataFrame being its core data structure. It provides powerful data manipulation, cleaning, and analysis capabilities, widely used in data science, machine learning, and business intelligence. An indispensable tool for Python developers and data analysts.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
What is XML format?
XML (eXtensible Markup Language) is the standard format for enterprise-level data exchange and configuration management, with strict syntax specifications and powerful validation mechanisms. Widely used in web services, configuration files, document storage, and system integration. Supports namespaces, schema validation, and XSLT transformation, making it important table data for enterprise applications.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
Is TableConvert really free to use?
Yes, TableConvert is completely free! All converter features, table editor, data generator tools, and export options are available without cost, registration, or hidden fees. Convert unlimited files online for free.
🌐
tableconvert.com
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.read_xml.html
pandas.read_xml — pandas documentation - PyData |
The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.
🌐
Saturn Cloud
saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide
Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog
November 15, 2023 - Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.
🌐
Table Convert
tableconvert.com › home › convert xml to pandas dataframe online
Convert XML to Pandas DataFrame Online - Table Convert
January 11, 2019 - Convert XML to PandasDataFrame online with our free online table converter. XML to PandasDataFrame converter: convert XML to PandasDataFrame in seconds — paste, edit, and download PandasDataFrame. Need to convert XML to PandasDataFrame for an API, spreadsheet, or documentation?
🌐
Saturn Cloud
saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python
Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog
December 28, 2023 - With this script, you can easily convert any complex XML file into a Pandas DataFrame or CSV file. This will make your data easier to work with and allow you to leverage the powerful data analysis capabilities of Python and Pandas.
Find elsewhere
🌐
YouTube
youtube.com › watch
Convert XML to Pandas DataFrame in Python - YouTube
Learn how to convert XML data to a Pandas DataFrame in Python with this easy-to-follow tutorial. Start optimizing your data analysis process today!----------...
Published   April 22, 2024
🌐
Delft Stack
delftstack.com › home › howto › python pandas › convert xml file to python nice pandas dataframe
How to Convert XML File to Pandas DataFrame | Delft Stack
February 2, 2024 - This process may be built as follows: import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("/content/drive/MyDrive/ABC/student.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in ...
🌐
Reddit
reddit.com › r/learnpython › parsing xml into a pandas dataframe
r/learnpython on Reddit: Parsing XML into a Pandas dataframe
December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

🌐
GeeksforGeeks
geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml
How to create Pandas DataFrame from nested XML? | GeeksforGeeks
April 28, 2021 - Note: The XML file should be saved in the same directory or folder where your Python code saved. ... We can also pass the XML content as a string inside triple quotes. In that case, we need to use the fromstring() function to read the string. Get the root using the 'tag' object and follow the same steps to convert it to a DataFrame as mentioned above. ... import xml.etree.ElementTree as ETree import pandas as pd xmldata = '''<?xml version="1.0" encoding="UTF-8"?> <Food> <Info> <Msg>Food Store items.</Msg> </Info> <store slNo="1"> <foodItem>meat</foodItem> <price>200</price> <quantity>1kg</quan
🌐
GeeksforGeeks
geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python
Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks
March 21, 2024 - We are going to extract the data from an XML file using this library, and then we will convert the extracted data into Dataframe. For converting into the Dataframes, we need to install the panda's library.
🌐
Stack Overflow
stackoverflow.com › questions › 63286268 › how-to-convert-a-large-xml-file-to-pandas-dataframe
python - How to convert a large XML file to Pandas DataFrame? - Stack Overflow
December 12, 2023 - I have created the following function which converts an XML File to a DataFrame. This function works good for files smaller than 1 GB, for anything greater than that the RAM(13GB Google Colab RAM) crashes. Same happens if I try it locally on Jupyter Notebook (4GB Laptop RAM). Is there a way to optimize the code? ... #Libraries import pandas as pd import xml.etree.cElementTree as ET #Function to convert XML file to Pandas Dataframe def xml2df(file_path): #Parsing XML File and obtaining root tree = ET.parse(file_path) root = tree.getroot() dict_list = [] for _, elem in ET.iterparse(file_path, events=("end",)): if elem.tag == "row": dict_list.append(elem.attrib) # PARSE ALL ATTRIBUTES elem.clear() df = pd.DataFrame(dict_list) return df
🌐
Like Geeks
likegeeks.com › home › python › pandas › parsing xml files into dataframes using pandas read_xml
Parsing XML Files into DataFrames using Pandas read_xml
October 16, 2023 - dtype_backend: Backend to use for dtype inference (“python” or “lxml”). Whether you have an XML file on your local disk, a URL that returns XML data, or a file-like object, read_xml can read XML from these sources. Let’s start with the most common scenario: reading from a local XML file. import pandas as pd xml_data = """ <data> <row> <shape>square</shape> <degrees>360</degrees> <sides>4.0</sides> </row> <row> <shape>triangle</shape> <degrees>180</degrees> <sides>3.0</sides> </row> </data> """ with open("shapes.xml", "w") as file: file.write(xml_data) df = pd.read_xml("shapes.xml") print(df)
🌐
Medium
medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149
Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium
December 2, 2024 - import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries ...
🌐
Medium
medium.com › @curiouskhanna › how-to-convert-xml-into-data-frame-using-python-507d5b0d1831
How to convert xml into data frame using python? | by Shubham Khanna | Medium
December 21, 2022 - Here’s an example of how you can use read_xml() to read an XML file and convert it into a dataframe: import pandas as pd # Read the XML file into a dataframe df = pd.read_xml('file.xml') # Print the dataframe print(df)
🌐
Stack Overflow
stackoverflow.com › questions › 44231166 › making-xml-to-dataframe-for-pandas-python
Making XML to Dataframe for Pandas Python - Stack Overflow
October 24, 2018 - import xml.etree.ElementTree as ET from lxml import etree import pandas as pd xml_data = 'Kandidatenlijsten_TK2017_Amsterdam.eml' def xml2df(xml_data): tree = ET.parse(xml_data) root = tree.getroot() all_records = [] headers = [] for i, child in enumerate(root): record = [] for subchild in child: record.append(subchild.text) if subchild.tag not in headers: headers.append(subchild.tag) all_records.append(record) return pd.DataFrame(all_records, columns=headers)` This gives the error:AssertionError: 3 columns passed, passed data had 2 columns · Thank you Kindly, Kind regards. ... Using this code: austintaylor.io/lxml/python/pandas/xml/dataframe/2016/07/08/… error i get is : AssertionError: 3 columns passed, passed data had 2 columns
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html
pandas.read_xml — pandas 2.2.2 documentation - PyData |
October 16, 2023 - Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.
🌐
Stack Abuse
stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas
Reading and Writing XML Files in Python with Pandas
August 21, 2024 - We will then iterate over the DataFrame and write the data with appropriate opening and closing tags of XML in the data list. Once that's complete, we iterate over the list once more to write the data into the XML file. Here's the code that shows the use of write(): import pandas as pd df = pd.Da...
🌐
Kaggle
kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe
Conversion of the XML file to a Pandas Dataframe
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds