python read xml file stack overflow

How to read xml file using python? [duplicate]

stackoverflow.com › questions › 57351894 › how-to-read-xml-file-using-python

Use ElementTree:

import xml.etree.ElementTree as ET
tree = ET.parse('Config.xml')
root = tree.getroot()
print(root.findall('.//Log'))

Output:

pawel@pawel-XPS-15-9570:~/test$ python parse_xml.py 
[<Element 'Log' at 0x7fb3f2eee9f

Answer from pawelbylina on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 57351894 › how-to-read-xml-file-using-python

How to read xml file using python? - Stack Overflow

Top answer

1 of 2

Use ElementTree:

import xml.etree.ElementTree as ET
tree = ET.parse('Config.xml')
root = tree.getroot()
print(root.findall('.//Log'))

Output:

pawel@pawel-XPS-15-9570:~/test$ python parse_xml.py 
[<Element 'Log' at 0x7fb3f2eee9f

2 of 2

Below:

import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Automation_Config>
    <Path>
        <Log>.\SERVER.log</Log>
        <Flag_Path>.\Flag</Flag_Path>
        <files>.\PO</files>
    </Path>

</Automation_Config>'''

root = ET.fromstring(xml)
for idx,log_element in enumerate(root.findall('.//Log')):
  print('{}) Log value: {}'.format(idx,log_element.text))

output

0) Log value: .\SERVER.log

Stack Overflow

stackoverflow.com › questions › 29267405 › read-xml-files-online › 29267495

python - read xml files online - Stack Overflow

response.read() takes an optional argument that is the number of bytes to read from the response; an integer, whole number.

Videos

38:58

YouTube

Python XML Parsing Tutorial Read And Write XML Files In Python ...

March 1, 2023

youtube.com

Parse XML Files with Python - Basics in 10 Minutes

17:34

YouTube

Full XML Processing Guide in Python - YouTube

Read XML File in Python - YouTube

Python Tutorial | Read XML using python | python xml parsing - YouTube

Python XML Parser Tutorial | Read and Write XML in Python | Python ...

stackoverflow.com › questions › 12290091 › reading-xml-file-and-fetching-its-attributes-value-in-python

Reading XML file and fetching its attributes value in Python - Stack Overflow

Top answer

1 of 7

Here's an lxml snippet that extracts an attribute as well as element text (your question was a little ambiguous about which one you needed, so I'm including both):

from lxml import etree
doc = etree.parse(filename)

memoryElem = doc.find('memory')
print memoryElem.text        # element text
print memoryElem.get('unit') # attribute

You asked (in a comment on Ali Afshar's answer) whether minidom (2.x, 3.x) is a good alternative. Here's the equivalent code using minidom; judge for yourself which is nicer:

import xml.dom.minidom as minidom
doc = minidom.parse(filename)

memoryElem = doc.getElementsByTagName('memory')[0]
print ''.join( [node.data for node in memoryElem.childNodes] )
print memoryElem.getAttribute('unit')

lxml seems like the winner to me.

2 of 7

XML

<data>
    <items>
        <item name="item1">item1</item>
        <item name="item2">item2</item>
        <item name="item3">item3</item>
        <item name="item4">item4</item>
    </items>
</data>

Python :

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item') 
print "Len : ", len(itemlist)
print "Attribute Name : ", itemlist[0].attributes['name'].value
print "Text : ", itemlist[0].firstChild.nodeValue
for s in itemlist :
    print "Attribute Name : ", s.attributes['name'].value
    print "Text : ", s.firstChild.nodeValue

Stack Overflow

stackoverflow.com › questions › 59157419 › how-to-read-data-from-xml-file-in-python

How to read data from xml file in python - Stack Overflow

Top answer

1 of 5

You need to iterate each TExportCarcass tag and then use find to access BodyNum

Ex:

from lxml import etree

doc = etree.parse('file.xml')
for elem in doc.findall('TExportCarcass'):
    print(elem.find("BodyNum").text)

Output:

6168
6169

print([i.text for i in doc.findall('TExportCarcass/BodyNum')]) #-->['6168', '6169']

2 of 5

When you run find on a text string, it will only search for elements at the root level. You can instead use xpath queries within find to search for any element within the doc:

To get the first element only:

from lxml import etree
doc = etree.parse('file.xml')

memoryElem = doc.find('.//BodyNum')
memoryElem.text
# 6168

To get all elements:

[ b.text for b in doc.iterfind('.//BodyNum') ]
# ['6168', '6169']

Stack Overflow

stackoverflow.com › questions › 1912434 › how-can-i-parse-xml-and-get-instances-of-a-particular-node-attribute

python - How can I parse XML and get instances of a particular node attribute? - Stack Overflow

Top answer

1 of 16

919

I suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree defines.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:

import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()

Or any of the many other ways shown at ElementTree. Then do something like:

for type_tag in root.findall('bar/type'):
    value = type_tag.get('foobar')
    print(value)

Output:

1
2

2 of 16

477

minidom is the quickest and pretty straight forward.

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

Python:

from xml.dom import minidom

dom = minidom.parse('items.xml')
elements = dom.getElementsByTagName('item')

print(f"There are {len(elements)} items:")

for element in elements:
    print(element.attributes['name'].value)

Output:

There are 4 items:
item1
item2
item3
item4

Stack Overflow

stackoverflow.com › questions › 29755652 › read-xml-as-a-txt-in-python

Read xml as a txt in python - Stack Overflow

Top answer

1 of 2

This has nothing to do with the xml file format, but in which encoding your file is. Python3 assumes everything to be in utf-8, but if you are on windows your file is probably in windows-1252. You should use:

f = open("text.txt", "r", encoding="cp1252")

2 of 2

this will sure do your job.

a=[]
with open('reboot.xml', 'r') as f:
    a = f.read()
f.closed
print a

Stack Overflow

stackoverflow.com › questions › 69432420 › how-to-read-xml-file-into-pandas-dataframe

python - How to read XML file into Pandas Dataframe - Stack Overflow

Top answer

1 of 2

Use [] to filter and reorganize columns:

cols = ['Application_ID', 'Product_Type', 'Product_ID']
df = pd.read_xml('product.xml')[cols]
print(df)

# Output:
  Application_ID  Product_Type  Product_ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22

If you want to replace '_' from your column names by ' ':

df.columns = df.columns.str.replace('_', ' ')
print(df)

# Output:
  Application ID  Product Type  Product ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22

2 of 2

As of Pandas 1.3.0 there is a read_xml() function that makes working with reading/writing XML data in/out of pandas much easier.

Once you upgrade to Pandas >1.3.0 you can simply use:

df = pd.read_xml("___XML_FILEPATH___")
print(df)

(Note that in the XML sample above the <Rowset> tag needs to be closed)

Stack Overflow

stackoverflow.com › questions › 56754935 › how-to-parse-unstructured-xml-file-using-python › 56755810

xml parsing - how to parse unstructured xml file using python? - Stack Overflow

Top answer

1 of 2

Using BeautifulSoup bs4 and lxml parser library to scrape xml data.

from bs4 import BeautifulSoup

xml_data = '''<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<id extension="4b78219a-1d02-4e7c-9870-dc7ce3b8a8fb" root="1.2.840.113619.21.1.3214775361124994304.5.1"/>
<code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of episode note"/>
<title>Summary</title>
<effectiveTime value="20170919160921ddfdsdsdsd31-0400"/>
<confidentialityCode code="N" codeSystem="2.16.840.dwdwddsd1.113883.5.25"/>
<recordTarget>
<patientRole><id extension="0" root="1.2.840.113619.21.1.3214775361124994304.2.1.1.2"/>
<addr use="HP"><streetAddressLine>addd2 </streetAddressLine><city>fgfgrtt</city><state>tr</state><postalCode>121213434</postalCode><country>rere</country></addr>
<patient>
<name><given>fname</given><family>lname</family></name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1" displayName="Female"/>
<birthTime value="19501025"/>
<maritalStatusCode code="M" codeSystem="2434.16.840.1.143434313883.5.2" displayName="M"/>
<languageCommunication>
<languageCode code="eng"/>
<proficiencyLevelCode nullFlavor="NI"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>'''


soup = BeautifulSoup(xml_data, "lxml")

title = soup.find("title")
print(title.text.strip())

patient = soup.find("patient")
given = patient.find("given").text.strip()
family = patient.find("family").text.strip()
gender = patient.find("administrativegendercode")['displayname'].strip()

print(given)
print(family)
print(gender)

O/P:

Summary
fname
lname
Female

Install library dependency:

pip3 install beautifulsoup4==4.7.1
pip3 install lxml==4.3.3

2 of 2

Or you can simply use lxml. Here is tutorial that I used: https://lxml.de/tutorial.html But it should be similar to:

from lxml import etree
root = etree.Element("patient")
print(root.find("given"))
print(root.find("family"))
print(root.find("give"))

Find elsewhere

Google Bing Mojeek

Python

docs.python.org › 3 › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API

January 29, 2026 - If not given, the standard XMLParser parser is used. parser must be a subclass of XMLParser and can only use the default TreeBuilder as a target. Returns an iterator providing (event, elem) pairs; it has a root attribute that references the root element of the resulting XML tree once source is fully read. The iterator has the close() method that closes the internal file object if source is a filename.

GeeksforGeeks

geeksforgeeks.org › python › reading-and-writing-xml-files-in-python

Reading and Writing XML Files in Python - GeeksforGeeks

January 12, 2026 - To modify an XML file, you first parse it, then change attributes or content, and finally save or print it. ... from bs4 import BeautifulSoup with open('dict.xml', 'r') as f: data = f.read() bs_data = BeautifulSoup(data, 'xml') for tag in bs_data.find_all('child', {'name':'Frank'}): tag['test'] = "WHAT !!" print(bs_data.prettify()) ... ElementTree is included in Python’s standard library, so no installation is required.

Stack Overflow

stackoverflow.com › questions › 24720668 › python-read-data-from-xml-file

Python read data from XML file - Stack Overflow

Top answer

1 of 1

The shown XML isn't valid because it uses a namespace prefix (sys) but doesn't define it and the XML parser (xml.dom.expatbuilder module) chokes on that. You would have to go straight to the expatbuilder in order to give its parse() function the argument to ignore namespaces. And if you want to extract the text node in the second <span> your index is off by one:

from xml.dom import expatbuilder


def main():
    document = expatbuilder.parse('test.xml', False)
    node = document.getElementsByTagName('span')[1]
    print float(node.firstChild.data)


if __name__ == '__main__':
    main()

Stack Overflow

stackoverflow.com › questions › tagged › xml-parsing

Newest 'xml-parsing' Questions - Stack Overflow

Each one starts with a bit of text and then some XML output like this : 2025-02-21 16:45:55,760 - ... ... I'm trying to pull data out of a file in order to split it.

Stack Overflow

stackoverflow.com › questions › 324214 › what-is-the-fastest-way-to-parse-large-xml-docs-in-python

What is the fastest way to parse large XML docs in Python? - Stack Overflow

Top answer

1 of 8

I looks to me as if you do not need any DOM capabilities from your program. I would second the use of the (c)ElementTree library. If you use the iterparse function of the cElementTree module, you can work your way through the xml and deal with the events as they occur.

Note however, Fredriks advice on using cElementTree iterparse function:

to parse large files, you can get rid of elements as soon as you’ve processed them:

for event, elem in iterparse(source):
    if elem.tag == "record":
        ... process record elements ...
        elem.clear()

The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:

# get an iterable
context = iterparse(source, events=("start", "end"))

# turn it into an iterator
context = iter(context)

# get the root element
event, root = context.next()

for event, elem in context:
    if event == "end" and elem.tag == "record":
        ... process record elements ...
        root.clear()

The lxml.iterparse() does not allow this.

The previous does not work on Python 3.7, consider the following way to get the first element.

import xml.etree.ElementTree as ET

# Get an iterable.
context = ET.iterparse(source, events=("start", "end"))
    
for index, (event, elem) in enumerate(context):
    # Get the root element.
    if index == 0:
        root = elem
    if event == "end" and elem.tag == "record":
        # ... process record elements ...
        root.clear()

2 of 8

Have you tried the cElementTree module?

cElementTree is included with Python 2.5 and later, as xml.etree.cElementTree. Refer the benchmarks.

Note that since Python 3.3 cElementTree is used as the default implementation so this change is not needed with a Python version 3.3+.

removed dead ImageShack link

Stack Exchange

codereview.stackexchange.com › questions › 194703 › importing-xml-dump-of-stack-overflow-questions-and-answers-into-sqlite3

python - Importing XML dump of Stack Overflow questions and answers into SQLite3 - Code Review Stack Exchange

Top answer

1 of 1

The very first feeling I had by throwing a glance on your program is that there are code smells.
This means there are several things you can do to clean this code, both in the general form and when it comes to details but I feel discouraged to mention any as long as you do not consider re-designing twice before coding ... In your case, I would not hesitate take time to re-structure the code in MVC terms (this can be a good start). Otherwise, as it is, your code would be hard to re-use or maintain.

Being that the file is so large, will I run into speed issues once all is migrated to the database?

A general rule of thumb when dealing with large XML files: using lxml is recommended as it is faster than the library you are using (and yes, it also implements iterparse())
When you use the iterative parsing approach (iterparse()) and process such a huge file, you can not allow yourself to skip reading this article: High-performance XML parsing in Python with lxml: Stretch the limits of this full-featured XML parsing and serializing suite.

Stack Overflow

stackoverflow.com › questions › 3106480 › really-simple-way-to-deal-with-xml-in-python

Really simple way to deal with XML in Python? - Stack Overflow

Top answer

1 of 9

lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.

>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'

2 of 9

You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:

# geetree.py
import xml.etree.ElementTree as ET

class GeeElem(object):
    """Wrapper around an ElementTree element. a['foo'] gets the
       attribute foo, a.foo gets the first subelement foo."""
    def __init__(self, elem):
        self.etElem = elem

    def __getitem__(self, name):
        res = self._getattr(name)
        if res is None:
            raise AttributeError, "No attribute named '%s'" % name
        return res

    def __getattr__(self, name):
        res = self._getelem(name)
        if res is None:
            raise IndexError, "No element named '%s'" % name
        return res

    def _getelem(self, name):
        res = self.etElem.find(name)
        if res is None:
            return None
        return GeeElem(res)

    def _getattr(self, name):
        return self.etElem.get(name)

class GeeTree(object):
    "Wrapper around an ElementTree."
    def __init__(self, fname):
        self.doc = ET.parse(fname)

    def __getattr__(self, name):
        if self.doc.getroot().tag != name:
            raise IndexError, "No element named '%s'" % name
        return GeeElem(self.doc.getroot())

    def getroot(self):
        return self.doc.getroot()

You invoke it so:

>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'

Edureka

edureka.co › blog › python-xml-parser-tutorial

Python XML Parser Tutorial | ElementTree and Minidom Parsing | Edureka

December 5, 2024 - In this Python XML Parser Tutorial, you will learn how to parse, read, modify and find elements from XML files in Python using ElementTree and Minidom.

Stack Overflow

stackoverflow.com › questions › 73377561 › read-xml-file-convert-it-to-table-dataframe

python - read xml file, convert it to table (dataframe) - Stack Overflow

Top answer

1 of 4

Given the two levels of nodes that cover the Coluna attributes, consider XSLT, the special-purpose language designed to transform or style original XML files. Python's lxml can run XSLT 1.0 scripts and being the default parse to pandas.read_xml can transform your raw XML into a flatter version to parse to DataFrame.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:pace='http://www.ms.com/pace'>
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- REDESIGN XML TO ONLY RETURN AnaliseDiaria NODES -->
    <xsl:template match="/*">
     <xsl:copy>
       <xsl:apply-templates select="descendant::pace:AnaliseDiaria"/>
     </xsl:copy>
    </xsl:template>
    
    <!-- REDESIGN AnaliseDiaria NODES -->
    <xsl:template match="pace:AnaliseDiaria">
     <xsl:copy>
       <!-- BRING DOWN Produto ATTRIBUTES WITH CURRENT ATTRIBUTES -->
       <xsl:copy-of select="ancestor::pace:Produto/@*|@*"/>
     </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

Online Demo

Python

analise_diaria_df = pd.read("input.xml", stylesheet="style.xsl")

analise_diaria_df 
#        Coluna1   Coluna2  Coluna3  ...    Coluna14  Coluna15   Coluna16
# 0    21-851611  CAMIO VO      NaN  ...         NaN       NaN        NaN
# 1   21-3667984    SCA4X2     -1.0  ...         NaN       NaN        NaN
# 2   21-3667994    SCA963     -1.0  ...         NaN       NaN        NaN
# 3   21-3676543    SCA713     -1.0  ...         NaN       NaN        NaN
# 4   21-3676601     SCA97     -1.0  ...         NaN       NaN        NaN
# 5   21-3814014    CAMIX2      NaN  ...         NaN       NaN        NaN
# 6   21-3814087     SCA56      NaN  ...         NaN       NaN        NaN
# 7   21-3814087     SCA56      NaN  ...  195.000,00       NF9  10203910A
# 8   21-3814087     SCA56      NaN  ...  195.090,00       NaN        NaN
# 9   21-3814087     SCA56      NaN  ...  195.270,00       NaN        NaN
# 10  21-3814087     SCA56      NaN  ...  195.482,60       NaN        NaN
# 11  21-3814087     SCA56      NaN  ...  195.627,80       NaN        NaN
# 12  21-3814087     SCA56      NaN  ...  204.529,82       NaN        NaN
# 13  21-3814087     SCA56      NaN  ...         NaN       NaN     158PES

2 of 4

Fortunately, in the case of your xml in the question, you can use the pandas read_xml() method, although you'll have to skirt around the namespaces issue:

import pandas as pd
pd.read_xml(file.xml,xpath='//*[local-name()="Linha"]//*[local-name()="Produto"]')

Output:

    Coluna1        Coluna2    Coluna3     Coluna4   Coluna5     {http://www.ms.com/pace}AnaliseDiaria
0   21-851611   CAMIO VO    NaN     NaN     NaN     NaN
1   21-3667984  SCA4X2  -1.0    NaN     NaN     NaN
2   21-3667994  SCA963  -1.0    NaN     NaN     NaN

etc. If you are not interested in one column or anothter, you can simply drop() it.

Stack Overflow

stackoverflow.com › questions › 58939375 › get-xml-file-name-from-loaded-xml-files-using-python

Get XML file name from loaded XML files using Python - Stack Overflow

August 10, 2024 - My Python code reads XML files stored at location and loads it into Python list after parsing using lxml library as shown below: XMLFILEList = [] FilePath = 'C:\\plugin\\TestPlugin\\' XMLFilePath ...

The Hitchhiker's Guide to Python

docs.python-guide.org › scenarios › xml

XML parsing — The Hitchhiker's Guide to Python

xmltodict also lets you roundtrip back to XML with the unparse function, has a streaming mode suitable for handling files that don’t fit in memory, and supports XML namespaces.

Studytonight

studytonight.com › python-howtos › how-to-read-xml-file-in-python

How to read XML file in Python - Studytonight

In this article, we will learn how to use different parsing modules to read XML documents in Python and some related custom examples as well.