xml to dataframe python

How to convert an XML file to nice pandas dataframe?

stackoverflow.com › questions › 28259301 › how-to-convert-an-xml-file-to-nice-pandas-dataframe

You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 28259301 › how-to-convert-an-xml-file-to-nice-pandas-dataframe

python - How to convert an XML file to nice pandas dataframe? - Stack Overflow

Top answer

1 of 5

61

You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

2 of 5

33

As of v1.3, you can simply use:

pandas.read_xml(path_or_file)

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 3.0.1 documentation

Read XML document into a DataFrame object. ... String path, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file.

Discussions

nested xml to dataframe - Data Science Stack Exchange

I am trying to convert the below mentioned sample xml file to a pandas dataframe. I have multiple xml files which I will loop over to add all xml data into a single dataframe once i succeed with this More on datascience.stackexchange.com

datascience.stackexchange.com

August 23, 2022

Parsing XML into a Pandas dataframe

To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com

r/learnpython

8

3

December 9, 2022

Pandas dataframe to nested xml

xml is part of the standard library. You have a nice column name convention and we could think about being smarter using the dot to automatically work out the parent though best to manually to put it together per row using the apply function import io import xml.etree.ElementTree as ET import pandas as pd def build_item_xml(row): item1 = ET.SubElement(items, 'Item') descriptors = ET.SubElement(item1, 'Descriptors') barcode= ET.SubElement(descriptors, 'Barcode') barcode.text=row["Descriptors.Barcode"] pricing = ET.SubElement(item1, 'Pricing') packetcost= ET.SubElement(pricing, 'PackCost') packetcost.text=str(row["Pricing.PackCost"]) # cast as without error cannot serialize 0.5625 (type float) # etc # add other attributes here # always return a result return row # mock dataframe with 2 rows based on columns supplied df = pd.DataFrame({ "Descriptors.Barcode": ["9770307017919", "9770307017920"], "Descriptors.SupplierCode": ["030701791", "030701792"], "Descriptors.Description": ["Daily Express (Mon)", "Daily Express (Tues)"], "Descriptors.CommodityGroup": [1,2], "Pricing.PackCost": [0.5625, 0.5626], "Pricing.CostPricePerUnit": [0.5625, 0.5626], "Pricing.RetailPrice": [0.75, 0.75], "Pricing.ValidFrom": [44193, 44194], "Sizing.Packsize": [1, 2], }) # https://docs.python.org/3/library/xml.etree.elementtree.html#building-xml-documents import xml.etree.ElementTree as ET items = ET.Element('Items') df = df.apply(build_item_xml, axis=1). # this calls build_item_xml per row ET.dump(items) More on reddit.com

r/learnpython

3

1

January 3, 2021

ElementTree and deeply nested XML

Subreddit for posting questions and asking for general advice about all topics related to learning python. ... Does anyone know how I can parse the data in the example XML below by grabbing data from the tag to the tag? More on reddit.com

r/learnpython

1

3

July 28, 2016

Videos

youtube.com

How to Convert an XML File to a Pandas DataFrame in Python

11:51

YouTube

Convert XML to Pandas DataFrame in Python - YouTube

April 22, 2024

youtube.com

Converting Nested XML Data to a Pandas DataFrame

youtube.com

How to Convert XML Data to a Pandas DataFrame in Python

youtube.com

Extracting XML Elements to Pandas DataFrame

View all

reddit.com › r/learnpython › parsing xml into a pandas dataframe

r/learnpython on Reddit: Parsing XML into a Pandas dataframe

December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

Top answer

1 of 1

5

To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame.

Medium

medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c

From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium

August 25, 2019 - We can try to convert this code to a more useful and versatile function, without having to hard-code any values: import pandas as pd import xml.etree.ElementTree as et def parse_XML(xml_file, df_cols): """Parse the input XML file and store the result in a pandas DataFrame with the given columns.

GeeksforGeeks

geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python

Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks

March 21, 2024 - Now we have extracted the data from the XML file using the BeautifulSoup into the DataFrame and it is stored as ‘df’. To see the DataFrame we use the print statement to print it. ... # Python program to convert xml # structure into dataframes using beautifulsoup # Import libraries from bs4 import BeautifulSoup import pandas as pd # Open XML file file = open("gfg.xml", 'r') # Read the contents of that file contents = file.read() soup = BeautifulSoup(contents, 'xml') # Extracting the data authors = soup.find_all('author') titles = soup.find_all('title') prices = soup.find_all('price') pubdat

Stack Abuse

stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas

Reading and Writing XML Files in Python with Pandas

August 21, 2024 - We can directly use objectify.parse() and give it the path to the XML file. To get the root element, we will use getroot() on the parsed XML data. Now we can loop through the children elements of the root node and write them into a Python list. Like before, we'll create a DataFrame using the ...

Find elsewhere

Google Bing Mojeek

Stack Exchange

datascience.stackexchange.com › questions › 113782 › nested-xml-to-dataframe

nested xml to dataframe - Data Science Stack Exchange

August 23, 2022 - filenames = glob.glob("/annotations/*.xml") filenames = [item.replace("\\", "/") for item in filenames] class XML2DataFrame: def __init__(self, filename): self.root = ET.parse(filename).getroot() def parse_root(self, root): return [self.parse_element(child) for child in iter(root)] def parse_element(self, element, parsed=None): if parsed is None: parsed = dict() for key in element.keys(): parsed[key] = element.attrib.get(key) if element.text: parsed[element.tag] = element.text for child in list(element): self.parse_element(child, parsed) return parsed def new_format(file): return "'{}'".format

Saturn Cloud

saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide

Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog

November 15, 2023 - And there you have it! Your XML data is now in a Python DataFrame, ready for analysis. Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.

Saturn Cloud

saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python

Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog

December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.

Python Forum

python-forum.io › thread-22948.html

Parse XML String in Pandas Dataframe

December 4, 2019 - Here is my situation: I have a pandas dataframe that contains one column with an xml string for each row. I need to be able to parse the xml string for each row to see the data elements of the xml file. All the code I have been able to find is code ...

PyPI

pypi.org › project › xml-to-df

xml-to-df · PyPI

Package to convert xml to Pandas dataframe (flattens each and every xml element to dataframe column)

      » pip install xml-to-df

Published Jan 06, 2021

Version 0.0.6

Homepage https://github.com/PraveenKumar-21/xml_to_df

Kaggle

kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe

Conversion of the XML file to a Pandas Dataframe | Kaggle

April 15, 2020 - Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

GitHub

github.com › selimonat › xml2dataframe

GitHub - selimonat/xml2dataframe: Reads xml data into a python DataFrame.

Reads xml data into a python DataFrame. Converts an XML file to Pandas Data Frame.

Author selimonat

YouTube

youtube.com › watch

Transforming Nested XML to Pandas DataFrame - YouTube

12:57

Hello and welcome to this tutorial. In this tutorial, you will learn how to transform XML documents to pandas data frames using Python and the element tree l...

Published October 21, 2023

Educative

educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup

How to convert XML to a DataFrame using BeautifulSoup

Here, we pass data and the data file format xml to the BeautifulSoup function. Step 4: We search the data. ... Step 5: We get the text data from XML. ... Step 6: We create and print the DataFrame.

PyPI

pypi.org › project › pandas-read-xml

pandas-read-xml

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

GeeksforGeeks

geeksforgeeks.org › python › how-to-create-pandas-dataframe-from-nested-xml

How to create Pandas DataFrame from nested XML? - GeeksforGeeks

July 23, 2025 - import xml.etree.ElementTree as ETree import pandas as pd # give the path where you saved the xml file # inside the quotes xmldata = "C: \\ProgramData\\Microsoft\\ Windows\\Start Menu\\Programs\\ Anaconda3(64-bit)\\xmltopandas.xml" prstree = ETree.parse(xmldata) root = prstree.getroot() # print(root) store_items = [] all_items = [] for storeno in root.iter('store'): store_Nr = storeno.attrib.get('slNo') itemsF = storeno.find('foodItem').text price = storeno.find('price').text quan = storeno.find('quantity').text dis = storeno.find('discount').text store_items = [store_Nr, itemsF, price, quan, dis] all_items.append(store_items) xmlToDf = pd.DataFrame(all_items, columns=[ 'SL No', 'ITEM_NUMBER', 'PRICE', 'QUANTITY', 'DISCOUNT']) print(xmlToDf.to_string(index=False)) ... Note: The XML file should be saved in the same directory or folder where your Python code saved.

Medium

medium.com › @whyamit101 › understanding-pandas-xml-and-its-structure-50ac94e748b7

Understanding pandas XML and Its Structure | by why amit | Medium

April 12, 2025 - This code snippet does just that. With pd.read_xml(), pandas makes it straightforward to import XML data, transforming it into a DataFrame that you can manipulate just like any other dataset.

Plain English

python.plainenglish.io › saving-xml-content-to-a-pandas-dataframe-using-xmltodict-b6fab32a5100

Saving XML content to a pandas DataFrame using xmltodict | by Liam Connors | Python in Plain English

June 7, 2021 - I am sure there are many ways to get data from XML into a DataFrame, but I found xmltodict offered a straightforward way for what I wanted to achieve. Also, working with xmltodict to convert to dictionaries is something I found very helpful for understanding more about the process of extracting specific data from lists of dictionaries. ... New Python ...

AskPython

askpython.com › home › pandas dataframe.to_xml – render a dataframe to an xml document

Pandas DataFrame.to_xml - Render a DataFrame to an XML Document - AskPython

February 15, 2023 - Install lxml provides safe and convenient access to these libraries using the ElementTree API. It extends the ElementTree API significantly to support XPath, RelaxNG, XML Schema, XSLT, C14N, and more. ... After importing the Pandas module, let’s write the simple code to render the DataFrame to an XML file.