I've got the needed outcome using following script.
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<base>
<element1>element 1</element1>
<element2>element 2</element2>
<element3>
<subElement3>subElement 3</subElement3>
</element3>
</base>
Python code:
import pandas as pd
from lxml import etree
data = "C:/Path/test.xml"
tree = etree.parse(data)
lstKey = []
lstValue = []
for p in tree.iter() :
lstKey.append(tree.getpath(p).replace("/",".")[1:])
lstValue.append(p.text)
df = pd.DataFrame({'key' : lstKey, 'value' : lstValue})
df.sort_values('key')
Result:

python - How to convert XML to table? - Stack Overflow
python - read xml file, convert it to table (dataframe) - Stack Overflow
Convert xml to excel/csv
Parse xml and store into mysql table in python - Code Review Stack Exchange
Videos
Given the two levels of nodes that cover the Coluna attributes, consider XSLT, the special-purpose language designed to transform or style original XML files. Python's lxml can run XSLT 1.0 scripts and being the default parse to pandas.read_xml can transform your raw XML into a flatter version to parse to DataFrame.
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:pace='http://www.ms.com/pace'>
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- REDESIGN XML TO ONLY RETURN AnaliseDiaria NODES -->
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="descendant::pace:AnaliseDiaria"/>
</xsl:copy>
</xsl:template>
<!-- REDESIGN AnaliseDiaria NODES -->
<xsl:template match="pace:AnaliseDiaria">
<xsl:copy>
<!-- BRING DOWN Produto ATTRIBUTES WITH CURRENT ATTRIBUTES -->
<xsl:copy-of select="ancestor::pace:Produto/@*|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Online Demo
Python
analise_diaria_df = pd.read("input.xml", stylesheet="style.xsl")
analise_diaria_df
# Coluna1 Coluna2 Coluna3 ... Coluna14 Coluna15 Coluna16
# 0 21-851611 CAMIO VO NaN ... NaN NaN NaN
# 1 21-3667984 SCA4X2 -1.0 ... NaN NaN NaN
# 2 21-3667994 SCA963 -1.0 ... NaN NaN NaN
# 3 21-3676543 SCA713 -1.0 ... NaN NaN NaN
# 4 21-3676601 SCA97 -1.0 ... NaN NaN NaN
# 5 21-3814014 CAMIX2 NaN ... NaN NaN NaN
# 6 21-3814087 SCA56 NaN ... NaN NaN NaN
# 7 21-3814087 SCA56 NaN ... 195.000,00 NF9 10203910A
# 8 21-3814087 SCA56 NaN ... 195.090,00 NaN NaN
# 9 21-3814087 SCA56 NaN ... 195.270,00 NaN NaN
# 10 21-3814087 SCA56 NaN ... 195.482,60 NaN NaN
# 11 21-3814087 SCA56 NaN ... 195.627,80 NaN NaN
# 12 21-3814087 SCA56 NaN ... 204.529,82 NaN NaN
# 13 21-3814087 SCA56 NaN ... NaN NaN 158PES
Fortunately, in the case of your xml in the question, you can use the pandas read_xml() method, although you'll have to skirt around the namespaces issue:
import pandas as pd
pd.read_xml(file.xml,xpath='//*[local-name()="Linha"]//*[local-name()="Produto"]')
Output:
Coluna1 Coluna2 Coluna3 Coluna4 Coluna5 {http://www.ms.com/pace}AnaliseDiaria
0 21-851611 CAMIO VO NaN NaN NaN NaN
1 21-3667984 SCA4X2 -1.0 NaN NaN NaN
2 21-3667994 SCA963 -1.0 NaN NaN NaN
etc. If you are not interested in one column or anothter, you can simply drop() it.
You can easily use xml (from the Python standard library) to convert to a pandas.DataFrame. Here's what I would do (when reading from a file replace xml_data with the name of your file or file object):
import pandas as pd
import xml.etree.ElementTree as ET
import io
def iter_docs(author):
author_attr = author.attrib
for doc in author.iter('document'):
doc_dict = author_attr.copy()
doc_dict.update(doc.attrib)
doc_dict['data'] = doc.text
yield doc_dict
xml_data = io.StringIO(u'''YOUR XML STRING HERE''')
etree = ET.parse(xml_data) #create an ElementTree object
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))
If there are multiple authors in your original document or the root of your XML is not an author, then I would add the following generator:
def iter_author(etree):
for author in etree.iter('author'):
for row in iter_docs(author):
yield row
and change doc_df = pd.DataFrame(list(iter_docs(etree.getroot()))) to doc_df = pd.DataFrame(list(iter_author(etree)))
Have a look at the ElementTree tutorial provided in the xml library documentation.
As of v1.3, you can simply use:
pandas.read_xml(path_or_file)
