python convert xml to dataframe

How to convert an XML file to nice pandas dataframe?

stackoverflow.com › questions › 28259301 › how-to-convert-an-xml-file-to-nice-pandas-dataframe

You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

Answer from JaminSore on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 28259301 › how-to-convert-an-xml-file-to-nice-pandas-dataframe

python - How to convert an XML file to nice pandas dataframe? - Stack Overflow

Top answer

1 of 5

61

You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))

Have a look at the ElementTree tutorial provided in the xml library documentation.

2 of 5

33

As of v1.3, you can simply use:

pandas.read_xml(path_or_file)

Saturn Cloud

saturncloud.io › blog › converting-xml-to-python-dataframe-a-comprehensive-guide

Converting XML to Python DataFrame: A Guide | Saturn Cloud Blog

November 15, 2023 - And there you have it! Your XML data is now in a Python DataFrame, ready for analysis. Converting XML to a Python DataFrame can be a bit tricky, but with the right approach, it becomes a straightforward task. This guide has shown you how to parse an XML file, extract the necessary data, and convert it into a DataFrame using pandas.

Discussions

Parsing XML into a Pandas dataframe

To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame. More on reddit.com

r/learnpython

8

3

December 9, 2022

Pandas XML?

Hello again. It may be simpler to skip the pandas xml methods and use something else. Even reading this data is a problem: >>> pd.read_xml("merge.xml.1678125433.bak").T 0 1 Company x None SenderName y None SenderPhone z None TransferDate a None BrandAAIAID b None DocumentTitle c None DocFormNumber 2.0 NaN EffectiveDate 2023-02-22 None SubmissionType FULL None MapperCompany d None MapperContact e None MapperPhone f None MapperEmail g None VcdbVersionDate 2023-01-26 None QdbVersionDate 2023-01-26 None PcdbVersionDate 2023-01-26 None action None A id NaN 1.0 BaseVehicle NaN NaN BodyType NaN NaN EngineBase NaN NaN Note None WITHOUT AUTO LEVELING SYSTEM Qty NaN 1.0 PartType NaN NaN Position NaN NaN Part NaN 701940.0 See how there is only a single id - all of the other id values in your data are lost. It looks like xmltodict may be the best choice here. import xmltodict from pathlib import Path xmldict = xmltodict.parse(Path("merge.xml").read_bytes()) df = pd.json_normalize(xmldict).explode("ACES.App") df["Aces.App.Part"] = pd.json_normalize(df["ACES.App"]).set_index(df.index)["Part"] I'm guessing this is the Part ID used in your unique checks. This will give you: ACES.@version 4.2 4.2 ACES.Header.Company x x ACES.Header.SenderName y y ACES.Header.SenderPhone z z ACES.Header.TransferDate a a ACES.Header.BrandAAIAID b b ACES.Header.DocumentTitle c c ACES.Header.DocFormNumber 2.0 2.0 ACES.Header.EffectiveDate 2023-02-22 2023-02-22 ACES.Header.SubmissionType FULL FULL ACES.Header.MapperCompany d d ACES.Header.MapperContact e e ACES.Header.MapperPhone f f ACES.Header.MapperEmail g g ACES.Header.VcdbVersionDate 2023-01-26 2023-01-26 ACES.Header.QdbVersionDate 2023-01-26 2023-01-26 ACES.Header.PcdbVersionDate 2023-01-26 2023-01-26 ACES.App {'@action': 'A', '@id': '1', 'BaseVehicle': {'... {'@action': 'B', '@id': '2', 'BaseVehicle': {'... Aces.App.Part 701940 12345 To go back to XML - you "reverse" the "json_normalize" def df_to_xmltodict(df, sep="."): result = {"ACES": {"Header": {}, "App": []}} for _, row in df.iterrows(): new_row = {} for name, value in row.items(): keys = name.split(sep) count = len(keys) - 1 parent = new_row for index, key in enumerate(keys): if index == count: parent[key] = value else: parent.setdefault(key, {}) parent = parent[key] result["ACES"]["App"].append(new_row["ACES"]["App"]) del new_row["ACES"]["App"] del new_row["Aces"] for key, value in new_row.items(): result["ACES"].update(value) return result You can then give this to xmltodict.unparse to go back to XML: >>> print(xmltodict.unparse(df_to_xmltodict(df), pretty=True)) x y z a b c 2.0 2023-02-22 FULL d e f g 2023-01-26 2023-01-26 2023-01-26 WITHOUT AUTO LEVELING SYSTEM 1 701940 YOLO 1 12345 More on reddit.com

r/learnpython

8

1

March 6, 2023

Videos

youtube.com

How to Convert an XML File to a Pandas DataFrame in Python

12:57

YouTube

Transforming Nested XML to Pandas DataFrame - YouTube

October 21, 2023

youtube.com

Pandas : How do convert a pandas dataframe to XML?

13:59

YouTube

Reading and writing XML with Pandas - YouTube

November 11, 2021

09:44

YouTube

How to transform an XML document into a Pandas DataFrame - YouTube

April 4, 2021

View all

Medium

medium.com › @robertopreste › from-xml-to-pandas-dataframes-9292980b1c1c

From XML to Pandas dataframes. How to parse XML files to obtain proper… | by Roberto Preste | Medium

August 25, 2019 - import pandas as pd import xml.etree.ElementTree as et xtree = et.parse("students.xml") xroot = xtree.getroot() df_cols = ["name", "email", "grade", "age"] rows = [] for node in xroot: s_name = node.attrib.get("name") s_mail = node.find("email").text if node is not None else None s_grade = node.find("grade").text if node is not None else None s_age = node.find("age").text if node is not None else None rows.append({"name": s_name, "email": s_mail, "grade": s_grade, "age": s_age}) out_df = pd.DataFrame(rows, columns = df_cols) The downside to this approach is that you need to know the structure of the XML file in advance, and you have to hard-code column names accordingly. We can try to convert this code to a more useful and versatile function, without having to hard-code any values:

reddit.com › r/learnpython › parsing xml into a pandas dataframe

r/learnpython on Reddit: Parsing XML into a Pandas dataframe

December 9, 2022 -

I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.

The file looks roughly like this

<?xml version="1.0" encoding="utf-8"?>

<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">

  <!--Build 18.0.1.69-->

  <columns>

    <column friendlyName="time" name="time" />

    <column friendlyName="Direction" name="Direction" />

    <column friendlyName="SQL" name="SQL" />

    <column friendlyName="ProcessID" name="ProcessID" />

    <column friendlyName="ThreadID" name="ThreadID" />


    <column friendlyName="TimeSpan" name="TimeSpan" />

    <column friendlyName="User" name="User" />

    <column friendlyName="HTTPSessionID" name="HTTPSessionID" />

    <column friendlyName="HTTPForward" name="HTTPForward" />

    <column friendlyName="SessionID" name="SessionID" />


    <column friendlyName="SessionGUID" name="SessionGUID" />

    <column friendlyName="Datasource" name="Datasource" />

    <column friendlyName="Sequence" name="Sequence" />

    <column friendlyName="LocalSequence" name="LocalSequence" />

    <column friendlyName="Message" name="Message" />

    <column friendlyName="AppPoolName" name="AppPoolName" />

  </columns>

  <rows>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">0 ms</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">Out</col>

      <col name="sql">UPDATE SET </col>

      <col name="Sequence">236419</col>

      <col name="LocalSequence">103825</col>

    </row>

    <row>

      <col name="time">11/14/2022 23:31:12</col>

      <col name="TimeSpan">N/A</col>

      <col name="ThreadID">0x00000025</col>

      <col name="User">USERNAME</col>

      <col name="HTTPSessionID"></col>

      <col name="HTTPForward">20.186.0.0</col>

      <col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>

      <col name="SessionID">6096783</col>

      <col name="Datasource">datasourceName</col>

      <col name="AppPoolName">C 1801AppServer Ext</col>

      <col name="Direction">In</col>

      <col name="sql">UPDATE SET</col>

      <col name="Sequence">236420</col>

      <col name="LocalSequence">103826</col>

    </row>

  </rows>

</diagnosticsLog>

I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.

Top answer

1 of 1

5

To parse an XML file into a Pandas DataFrame, you can use the from_dict method of the DataFrame class. First, you will need to use the ElementTree module to parse the XML file and extract the relevant data. Here is an example of how this can be done: import xml.etree.ElementTree as ET import pandas as pd Parse the XML file using ElementTree tree = ET.parse('my_file.xml') root = tree.getroot() Extract the column names from the 'columns' element columns = [col.attrib['friendlyName'] for col in root.find('columns')] Create an empty list to store the data for each row data = [] Iterate over the 'row' elements and extract the data for each one for row in root.find('rows'): row_data = {} for col in row: # Add the data for each column to the dictionary row_data[col.attrib['name']] = col.text # Add the dictionary for this row to the list data.append(row_data) Create a DataFrame using the column names and data df = pd.DataFrame.from_dict(data, columns=columns) This code will parse the XML file and extract the data for each row and column, storing it in a dictionary. The dictionary is then used to create a DataFrame using the from_dict method. This DataFrame will have the column names as the columns and each row of data as a row in the DataFrame.

GeeksforGeeks

geeksforgeeks.org › python › convert-xml-structure-to-dataframe-using-beautifulsoup-python

Convert XML structure to DataFrame using BeautifulSoup - Python - GeeksforGeeks

March 21, 2024 - Now we have extracted the data from the XML file using the BeautifulSoup into the DataFrame and it is stored as ‘df’. To see the DataFrame we use the print statement to print it. ... # Python program to convert xml # structure into dataframes using beautifulsoup # Import libraries from bs4 import BeautifulSoup import pandas as pd # Open XML file file = open("gfg.xml", 'r') # Read the contents of that file contents = file.read() soup = BeautifulSoup(contents, 'xml') # Extracting the data authors = soup.find_all('author') titles = soup.find_all('title') prices = soup.find_all('price') pubdat

Saturn Cloud

saturncloud.io › blog › converting-complex-xml-files-to-pandas-dataframecsv-in-python

Converting Complex XML Files to Pandas DataFrame/CSV in Python | Saturn Cloud Blog

December 28, 2023 - The first step in converting an XML file to a DataFrame or CSV is parsing the XML file. We’ll use the xml.etree.ElementTree module in Python, which provides a lightweight and efficient API for parsing and creating XML data.

Find elsewhere

Google Bing Mojeek

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 3.0.1 documentation - PyData |

The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.

YouTube

youtube.com › watch

Convert XML to DataFrame in Python using pandas - Part #2 - YouTube

15:13

This demo explains everything you need to successfully apply the steps in your projectsetup on windows:python -m pip install -U pip setuptoolspip3 install ju...

Published December 4, 2019

Delft Stack

delftstack.com › home › howto › python pandas › convert xml file to python nice pandas dataframe

How to Convert XML File to Pandas DataFrame | Delft Stack

October 16, 2023 - This tutorial introduces how an XML file is converted into a Python Pandas nice dataframe. The library used for this is the xml.etree.ElementTree.

Table Convert

tableconvert.com › home › convert xml to pandas dataframe online

Convert XML to Pandas DataFrame Online - Table Convert

March 21, 2024 - Generate standard Pandas DataFrame code with support for data type specifications, index settings, and data operations. Generated code can be directly executed in Python environment for data analysis and processing. Extract tables from any website with one click. Convert to 30+ formats including Excel, CSV, JSON instantly - no copy-pasting required. Converting XML ...

Towards Data Science

towardsdatascience.com › home › latest › extracting information from xml files into a pandas dataframe

Extracting information from XML files into a Pandas dataframe | Towards Data Science

December 12, 2023 - Parse XML files with the Python's ElementTree package

Kaggle

kaggle.com › code › tinlla › conversion-of-the-xml-file-to-a-pandas-dataframe

Conversion of the XML file to a Pandas Dataframe

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

Educative

educative.io › answers › how-to-convert-xml-to-a-dataframe-using-beautifulsoup

How to convert XML to a DataFrame using BeautifulSoup

October 10, 2023 - Step 5: We get the text data from XML. ... Step 6: We create and print the DataFrame.

YouTube

youtube.com › watch

Convert XML to Pandas DataFrame in Python - YouTube

11:51

Learn how to convert XML data to a Pandas DataFrame in Python with this easy-to-follow tutorial. Start optimizing your data analysis process today!----------...

Published October 8, 2025

GeeksforGeeks

geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml

How to create Pandas DataFrame from nested XML? | GeeksforGeeks

May 22, 2017 - In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file.

Stack Abuse

stackabuse.com › reading-and-writing-xml-files-in-python-with-pandas

Reading and Writing XML Files in Python with Pandas

August 21, 2024 - from lxml import objectify import ... 0 1.0 7020000.0 35237.0 1 3.0 10000000.0 32238.0 2 nan 4128000.0 44699.0 · The xmltodict module converts the XML data into a Python dictionary as the name suggests....

Stack Overflow

stackoverflow.com › questions › 54566730 › convert-xml-data-into-dataframe

python - convert xml data into dataframe - Stack Overflow

import xml.etree.ElementTree as ET import pandas as pd def getMetrics(file_name): tree = ET.parse(file_name) root = tree.getroot() result = [] for setnode in root.iter('Set'): node = setnode.attrib["Parameter"] for ifnode in setnode: if "Parameter" in ifnode.attrib: result.append(dict(node=node, parameter=ifnode.attrib.get("Parameter"))) return result df = pd.DataFrame(getMetrics('sample.xml')) print(df) ... <?xml version="1.0" encoding="UTF-8"?> <Rules> <Set Parameter="4" To="90"> <If Parameter="1087" EqualsTo="90" /> </Set> <Set Parameter="5" To="-5"> <If Parameter="1087" EqualsTo="87" /> </

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 2.2.2 documentation - PyData |

December 21, 2022 - Deprecated since version 2.1.0: Passing xml literal strings is deprecated. Wrap literal xml input in io.StringIO or io.BytesIO instead. ... The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element.

GitHub

gist.github.com › mattmc3 › 712f280ec81044ec7bd12a6dda560787

Python: Import XML to Pandas dataframe, and then dataframe to Sqlite database · GitHub

April 28, 2021 - Python: Import XML to Pandas dataframe, and then dataframe to Sqlite database - import_xml_to_dataframe_to_sql.py

Medium

medium.com › @sounder.rahul › reading-xml-file-using-python-pandas-and-converting-it-into-a-pyspark-dataframe-52fd798c8149

Finance Domain — Reading XML File using Python pandas and converting it into a PySpark DataFrame | by Rahul Sounder | Medium

January 11, 2019 - Code to Generate Big Data — https://medium.com/@sounder.rahul/python-faker-to-generate-data-for-marketing-domain-complex-xml-file-163a0649db4e · import xml.etree.ElementTree as ET import pandas as pd # Function to parse XML and create Pandas DataFrame def parse_xml_to_dataframe(file_path): tree = ET.parse(file_path) root = tree.getroot() # Extract data from XML into a list of dictionaries data = [] for transaction in root.findall("transaction"): record = { "id": transaction.find("id").text, "date": transaction.find("date").text, "type": transaction.find("type").text, "amount": float(transac