python script to convert xml to text

stackoverflow.com › questions › 53262708 › i-want-to-convert-a-xml-files-into-text-file-using-python

xml parsing - I want to convert a xml files into text file using python - Stack Overflow

github.com › imrrahul › XML-TMX-to-Text-.txt-file-converter-

Assuming you have a file called file.xml, containing:

<annotation>
    <folder>all_images</folder>
    <filename>0.jpg</filename>
    <path>/home/vishnu/Documents/all_images/0.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>4250</width>
        <height>5500</height>
        <depth>1</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>word</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>308</xmin>
            <ymin>45</ymin>
            <xmax>502</xmax>
            <ymax>162</ymax>
        </bndbox>
    </object>
</annotation>

Then the following Python script in the same folder gives you an idea how to use the Standard Library ElementTree API to parse the file:

import xml.etree.ElementTree as ET

tree = ET.parse("file.xml")
root = tree.getroot()

print(root.find("./folder").text)
print(root.find("./object/name").text)
print(root.find("./object/bndbox/xmin").text)

You will need to work out how to write the values to your own text files, but that should be straightforward. There are lots of resources such as this one.

GitHub

GitHub - imrrahul/XML-TMX-to-Text-.txt-file-converter-: This python script converts XML/TMX file into Text file (.txt).

This python script converts XML/TMX file into Text file (.txt) and uses beautifulSoup to extract only the text from the xml/tmx file and removes all html/xml/tmx tags.

Forked by 2 users

Languages Python 100.0% | Python 100.0%

Videos

29:00

Python XML Parser Tutorial | Read and Write XML in Python | Python ...

February 11, 2023

youtube.com

Parse XML Files with Python - Basics in 10 Minutes

17:34

Full XML Processing Guide in Python - YouTube

August 26, 2021

77.8K

youtube.com

Converting text to xml format

37:03

How to Convert Text Files to XML Files in Python (without Libraries) ...

August 6, 2020

13:25

Python Tutorial | Create XML file using python | python xml examples ...

October 2, 2019

View all

Aspose

products.aspose.com › aspose.cells › python via .net › conversion › xml to txt

Convert XML to TXT in Python Excel Library - Conversion

November 13, 2025 - Add a library reference (import the library) to your Python project. Load XML file with an instance of Workbook. Convert XML to TXT by calling Workbook.save method.

SourceForge

xml2txt.sourceforge.net

xml2txt Documentation

The script xml2txt converts a valid XML document to formatted text.

Aspose

products.aspose.com › aspose.cells › python via java › conversion › xml to txt

Convert XML to TXT in Python Excel Library

November 13, 2025 - Add a library reference (import the library) to your Python project. Load XML file with an instance of Workbook. Convert XML to TXT by calling Workbook.save method.

stackoverflow.com › questions › 31282067 › how-to-transform-xml-to-text

python - How to transform XML to text - Stack Overflow

gist.github.com › aspose-com-gists › de1e265bc7a9ee9b806fb7be86853933

Edit: updated now that OP's included his Python code.

Your problem is that lxml.etree.tostring and also the .write method are only meaningful on XML, not on an XSLT result with output method="text" which might not have a single root element like XML does. For some confusing reason, the functions do have a method= keyword argument but it does not do anything useful.

This is what you should do:

import lxml.etree as etree
data = etree.parse('data.xml')
transform = etree.XSLT(etree.parse('txt.xslt'))
res = transform(data)
bytes(res)

b'\nCEO and 1Director of Operations and 2Human Resources Manager and 3\n'

If you're interested in a real world example, I recently made a patch.

GitHub

Convert TXT to XML in Python | Python Text to XML Converter · GitHub

Clone this repository at <script ...;></script> Save aspose-com-gists/de1e265bc7a9ee9b806fb7be86853933 to your computer and use it in GitHub Desktop. ... Learn how to convert a TXT file to XML in Python : https://blog.aspose.com/2022/06/23/convert-txt-to-xml-in-python/ ... This file contains hidden or bidirectional Unicode text that may be ...

Quora

quora.com › How-can-I-convert-XML-files-to-text-files

How to convert XML files to text files - Quora

Use streaming parsers (Python: iterparse; Java: StAX; streaming xmlstarlet) to extract and write line-by-line without loading whole file. ... Keep XML content exactly: rename or copy file. Human-readable formatted XML: xmllint or editor pretty-print. Extract specific values to plain text lines: xmlstarlet or XPath scripts.

Find elsewhere

Google Bing Mojeek

GitHub

github.com › knipknap › Gelatin

GitHub - knipknap/Gelatin: Transform text files to XML, JSON, or YAML

<xml> <user lastname="Doe" firstname="John"> <office>1st Ave</office> <birth-date>1978-01-01</birth-date> </user> <user lastname="Foo" firstname="Jane"> <office>2nd Ave</office> <birth-date>1970-01-01</birth-date> </user> </xml> ... from Gelatin.util import compile, generate # Parse your .gel file. syntax = compile('syntax.gel') # Convert your input file to XML, YAML, and JSON.

Starred by 159 users

Forked by 42 users

stackoverflow.com › questions › 7691514 › extracting-text-from-xml-using-python

Extracting text from XML using python - Stack Overflow

1 of 6

There is already a built-in XML library, notably ElementTree. For example:

>>> from xml.etree import cElementTree as ET
>>> xmlstr = """
... <root>
... <page>
...   <title>Chapter 1</title>
...   <content>Welcome to Chapter 1</content>
... </page>
... <page>
...  <title>Chapter 2</title>
...  <content>Welcome to Chapter 2</content>
... </page>
... </root>
... """
>>> root = ET.fromstring(xmlstr)
>>> for page in list(root):
...     title = page.find('title').text
...     content = page.find('content').text
...     print('title: %s; content: %s' % (title, content))
...
title: Chapter 1; content: Welcome to Chapter 1
title: Chapter 2; content: Welcome to Chapter 2

2 of 6

You can also try this code to extract texts:

from bs4 import BeautifulSoup
import csv

data ="""<page>
  <title>Chapter 1</title>
  <content>Welcome to Chapter 1</content>
</page>
<page>
 <title>Chapter 2</title>
 <content>Welcome to Chapter 2</content>
</page>"""

soup = BeautifulSoup(data, "html.parser")

########### Title #############
required0 = soup.find_all("title")
title = []
for i in required0:
    title.append(i.get_text())

########### Content #############
required0 = soup.find_all("content")
content = []
for i in required0:
    content.append(i.get_text())

doc1 = list(zip(title, content))
for i in doc1:
    print(i)

Output:

('Chapter 1', 'Welcome to Chapter 1')
('Chapter 2', 'Welcome to Chapter 2')

stackoverflow.com › questions › 33814607 › converting-a-python-xml-elementtree-to-a-string

Converting a Python XML ElementTree to a String - Stack Overflow

sikathabis.wordpress.com › 2017 › 12 › 16 › how-to-convert-xml-file-into-text-file

1 of 2

This should work:-

xmlstr = ET.tostring(root, encoding='utf8', method='xml')

2 of 2

How do I convert `ElementTree.Element` to a String?

For Python 3:

xml_str = ElementTree.tostring(xml, encoding='unicode')

For Python 2:

xml_str = ElementTree.tostring(xml, encoding='utf-8')

For compatibility with both Python 2 & 3:

xml_str = ElementTree.tostring(xml).decode()

Example usage

from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml).decode()
print(xml_str)

Output:

<Person Name="John" />

Explanation

Despite what the name implies, ElementTree.tostring() returns a bytestring by default in Python 2 & 3. This is an issue in Python 3, which uses Unicode for strings.

In Python 2 you could use the str type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. [...]

To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.

^{Source: Porting Python 2 Code to Python 3}

If we know what version of Python is being used, we can specify the encoding as unicode or utf-8. Otherwise, if we need compatibility with both Python 2 & 3, we can use decode() to convert into the correct type.

For reference, I've included a comparison of .tostring() results between Python 2 and Python 3.

ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode

ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />

Thanks to Martijn Peters for pointing out that the str datatype changed between Python 2 and 3.

Why not use str()?

In most scenarios, using str() would be the "cannonical" way to convert an object to a string. Unfortunately, using this with Element returns the object's location in memory as a hexstring, rather than a string representation of the object's data.

from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
print(str(xml))  # <Element 'Person' at 0x00497A80>

Wordpress

How to convert xml into text file [python] | Oracle, Web, Script, SQLserver, Tips & Trick

December 16, 2017 - import xml.etree.ElementTree as ET tree = ET.parse('test.xml') notags = ET.tostring(tree.getroot(), encoding='utf-8',method='text') f = open('test.txt','w') f.write(notags) f.close() reference : link1 | link2

stackoverflow.com › questions › 58600631 › how-to-convert-a-txt-to-xml-in-python

How to convert a .txt to .xml in python - Stack Overflow

<?xml version="1.0" encoding="utf-8"?> <root> <filedata> </serialnumber> <operatorid>test</operatorid> <time>00:00:42 Test Step 2</time> <tp1>17.25</tp1> <tp2>2.46</tp2> </filedata> ... </root> I was using a code like this to convert my previous text file to xml...but right now I'm facing problems in splitting the lines.

stackoverflow.com › questions › 26056316 › how-to-convert-txt-file-into-xml-file-using-python

How to convert .txt file into xml file using python? - Stack Overflow

geeksforgeeks.org › python › reading-and-writing-xml-files-in-python

Try the following code as a starter:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'

        # If the line contains the wanted data, process it.
        m = rex.search(line)
        if m:
            # Fix some problems with the title as it will be used
            # as the tag name.
            title = m.group('title')
            title = title.replace('&', '')
            title = title.replace(' ', '')

            e = ET.SubElement(celldata, title.lower())
            e.text = m.group('value')
            e.tail = '\n'

# Display for debugging            
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

It displays for your example data:

<root>
<celldata>
<latitude>23.1100348</latitude>
<longitude>72.5364922</longitude>
<datetime>30:August:2014 05:04:31 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

<celldata>
<latitude>23.1120549</latitude>
<longitude>72.5397988</longitude>
<datetime>30:August:2014 05:04:34 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

</root>

Update for the wanted neigbour list:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'
        else:
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

Update for accepting empty line before neighbours -- also better implementation for general purposes:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    status = 0              # init status of the finite automaton
    for line in f:
        if status == 0:     # lines of the heading expected
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                    status = 1  # empty line and then list of neighbours expected
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
                    # keep the same status

        elif status == 1:   # empty line expected
            if line.isspace():
                status = 2  # list of neighbours must follow
            else:
                raise RuntimeError('Empty line expected. (status == {})'.format(status))
                status = 999 # error status

        elif status == 2:   # neighbour or the empty line as final separator

            if line.isspace():
                celldata = ET.SubElement(root, 'celldata')
                celldata.text = '\n'
                celldata.tail = '\n\n'
                status = 0  # go to the initial status
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed
                # keep the same status

        elif status == 999: # error status -- break the loop
            break

        else:
            raise LogicError('Unexpected status {}.'.format(status))
            break

# Display for debugging
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

The code implements so called finite automaton where the status variable represents its current status. You can visualize it using pencil and paper -- draw small circles with the status numbers inside (called nodes in the graph theory). Being at the status, you allow only some kind of input (line). When the input is recognized, you draw the arrow (oriented edge in the graph theory) to another status (possibly to the same status, as a loop returning back to the same node). The arrow is annotated `condition | action'.

The result may look complex at the beginning; however, it is easy in the sense that you can always focus ony on the part of the code that belongs to certain status. And also, the code can be easily modified. However, finite automatons have limited power. But they are just perfect for this kind of problems.

GeeksforGeeks

Reading and Writing XML Files in Python - GeeksforGeeks

January 12, 2026 - .text: Assigns text content inside the XML tags. ET.tostring(root): Converts the XML tree into bytes. Writing to "GFG.xml": Saves the generated XML to a file. XML parsing in Python · XML Tutorial · Comment · Article Tags: Article Tags: Python · python-utility ·

stackoverflow.com › questions › 57245432 › how-do-i-convert-xml-to-html-in-python

How do I convert XML to HTML in python? - Stack Overflow

1 of 2

One way to achieve this is to use XSLT Transformation. Most programming languages including Python will have support to convert an XML document into another document (e.g. HTML) when supplied with an XSL.

A good tutorial on XSLT Transformation can be found here

Use of Python to achieve transformation (once an XSL is prepared) is described here

2 of 2

There are several things wrong with your XHTML source. First, xmlns is not a correct attribute for the xml declaration; it should be put on the root element instead. And the root element for XHTML is <html>, not <xhtml>. So the valid XHTML input in this particular case would be

<?xml version=\"1.0\"?>\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n<head><title></title></head>\n<body>\n</body></html>

That said, I'm not sure if xml.etree.ElementTree accepts that, having no experience with it.

stackoverflow.com › questions › 53340100 › straight-conversion-of-xml-file-into-txt-file-no-parsing-required

python - Straight conversion of .xml file into .txt file...no parsing required - Stack Overflow

oreilly.com › library › view › python-cookbook › 0596001673 › ch12s08.html

Just read the data as string (or bytes if you want) and write it out as a different extension...

with open('myfile.xml', 'r') as file_in, open('myfile.txt', 'w') as file_out:
    data = file_in.read()
    file_out.write(data)

Or if you really just want to change the current file's extension:

import os
os.rename('myfile.xml','myfile.txt')

O'Reilly

Converting Ad-Hoc Text into XML Markup - Python Cookbook [Book]

def markup(text, paragraph_tag='paragraph', inline_tags={'_':'highlight'}): # First we must escape special characters, which have special meaning in XML text = text.replace('&', "&")\ .replace('<', "<")\ .replace('"', """)\ .replace('>', ">") # paragraph markup; pass any false value as the paragraph_tag argument to disable if paragraph_tag: # Get list of lines, removing leading and trailing empty lines: lines = text.splitlines(1) while lines and lines[-1].isspace(): lines.pop( ) while lines and lines[0].isspace( ): lines.pop(0) # Insert paragraph tags on empty lines: marker = '<

Stack Exchange

softwarerecs.stackexchange.com › questions › 86517 › convert-xml-with-binary-content-into-readable-text-and-back

python - Convert XML with binary content into readable text and back - Software Recommendations Stack Exchange