python parse nested xml into df

Parsing deeply nested XML into dataframe with python - struggling with deeper elements

stackoverflow.com › questions › 69579387 › parsing-deeply-nested-xml-into-dataframe-with-python-struggling-with-deeper-el

See below

import requests
import xml.etree.ElementTree as ET
import pandas as pd

r = requests.get('https://raw.githubusercontent.com/dgs2021/golfdeals/main/35386_3864840_mp_delta.xml')
attrb_fields =  {'manufacturer_name': 'manufacturer','name':'name','part_number':'part_number'}
sub_elements = {'retail':'retail','product':'product'}

root = ET.fromstring(r.content)

data = []
for p in root.findall('product'):
  entry = {v:p.attrib.get(k,'NA') for k,v in attrb_fields.items()}
  for k,v in sub_elements.items():
    e = p.find(f'.//{v}')
    entry[k] = e.text if e is not None else 'NA'
  data.append(entry)
columns = list(attrb_fields.values()) + list(sub_elements.values())
df = pd.DataFrame(data,columns= columns)
print(df)

output

          manufacturer  ...                                            product
0           Champ Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
1         Stinger Tees  ...  https://click.linksynergy.com/link?id=83wh4zNK...
2           Vegas Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
3        Ray Cook Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
4     Rock Bottom Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
...                ...  ...                                                ...
4100     Callaway Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
4101        Cobra Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
4102      Odyssey Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
4103   TaylorMade Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...
4104     Titleist Golf  ...  https://click.linksynergy.com/link?id=83wh4zNK...

[4105 rows x 5 columns]

Answer from balderman on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_xml.html

pandas.read_xml — pandas 3.0.1 documentation - PyData |

The XPath to parse required set of nodes for migration to DataFrame.``XPath`` should return a collection of elements and not a single element. Note: The etree parser supports limited XPath expressions. For more complex XPath, use lxml which requires installation. ... The namespaces defined in XML document as dicts with key being namespace prefix and value the URI.

GeeksforGeeks

geeksforgeeks.org › how-to-create-pandas-dataframe-from-nested-xml

How to create Pandas DataFrame from nested XML? | GeeksforGeeks

April 28, 2021 - In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file.

Discussions

Parsing deeply nested XML into dataframe with python - struggling with deeper elements - Stack Overflow

I'm attempting to parse out a fairly nested XML file. I've spent the last few hours trying to find a solution with no luck. I'm not sure if the issue is with namespaces, or needing to findall within the loop. I am able to extract the higher level elements but the deeper nested elements are not being extracted. I am looking to export Part_number, manufacturer_name, name, Product and Retail to a df... More on stackoverflow.com

stackoverflow.com

October 15, 2021

How to read nested xml file with python pandas? - Stack Overflow

While XML as a data format can ... deeply nested, data frames must adhere to a single structure of two dimensions: row by column. Hence, as noted in docs, pandas.read_xml, is a convenience method best for flatter, shallow XML files. You can use xpath to traverse different areas of the document, not just the default /*. However, you can use XSLT 1.0 (special purpose language designed to transform XML files) with the default parser, lxml, to ... More on stackoverflow.com

stackoverflow.com

python - How to convert an XML file to nice pandas dataframe? - Stack Overflow

@CristianCiupitu I see the question is tagged python-2.7 ---u prefix has been added. 2018-07-25T02:59:45.58Z+00:00 ... Actually, from this specific post, OP needs to adjust XPath to look one level deeper from root: pandas.read_xml(path_or_file, xpath="/Author/document") 2021-05-19T16:39:32.857Z+00:00 ... Here is another way of converting a xml to pandas data frame. For example i have parsing ... More on stackoverflow.com

stackoverflow.com

python - Nested XML to Pandas dataframe - Stack Overflow

I'm trying to create a script to convert nested XML files to a Pandas dataframe. I've found this article https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c, which does a g... More on stackoverflow.com

stackoverflow.com

Videos