python xml encode example

stackoverflow.com › questions › 1546717 › escaping-strings-for-use-in-xml

Something like this?

>>> from xml.sax.saxutils import escape
>>> escape("< & >")   
'&lt; &amp; &gt;'

Answer from mbarkhau on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 1546717 › escaping-strings-for-use-in-xml

python - Escaping strings for use in XML - Stack Overflow

Top answer

1 of 8

Something like this?

>>> from xml.sax.saxutils import escape
>>> escape("< & >")   
'&lt; &amp; &gt;'

2 of 8

xml.sax.saxutils does not escape quotation characters (")

So here is another one:

def escape( str_xml: str ):
    str_xml = str_xml.replace("&", "&amp;")
    str_xml = str_xml.replace("<", "&lt;")
    str_xml = str_xml.replace(">", "&gt;")
    str_xml = str_xml.replace("\"", "&quot;")
    str_xml = str_xml.replace("'", "&apos;")
    return str_xml

if you look it up then xml.sax.saxutils only does string replace

Python

wiki.python.org › moin › EscapingXml

Escaping XML - Python Wiki

November 15, 2008 - This can be used to add additional entities specific to the DTD of the document being generated, or to cause particular characters to be encoded as character references: 1 >>> escape("abc", {"b": "b"}) 2 'abc' 3 >>> escape("My product, PyThingaMaJiggie, is really cool.", 4 ... {"PyThingaMaJiggie": "&productName;"}) 5 'My product, &productName;, is really cool.' The xml.sax.saxutils module provides an unescape() function as well.

Python

docs.python.org › 3 › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API

January 29, 2026 - Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding [1] is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring ...

Stack Overflow

stackoverflow.com › questions › 31390213 › how-to-parse-an-xml-file-with-encoding-declaration-in-python

How to parse an XML file with encoding declaration in Python? - Stack Overflow

Top answer

1 of 2

One thing that I tried, that worked for me is to open the xml file as a file object , then use ElementTree.fromstring() passing in the complete contents of the file.

Example -

>>> import xml.etree.ElementTree as ET
>>> ef = ET.parse('a.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 1187, in parse
    tree.parse(source, parser)
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
ValueError: multi-byte encodings are not supported
>>> with open('a.xml','r') as f:
...     ef = ET.fromstring(f.read())
...
>>> ef
<Element 'productMeta' at 0x028DF180>

You can also, create an XMLParser with the required encoding, and this should enable you to be able to parse strings from that encoding, Example -

import xml.etree.ElementTree as ET
xmlp = ET.XMLParser(encoding="utf-8")
f = ET.parse('a.xml',parser=xmlp)

2 of 2

 ET.parse('a.xml', parser=ET.XMLParser(encoding='iso-8859-5'))

solved my problem when dealed with xml excel in python

Stuffaboutcode

stuffaboutcode.com › 2012 › 06 › python-encode-xml-escape-characters.html

<Stuff about="code" />: Python - encode XML escape characters

xmlText = encodeXMLText("Some text with escape characters & " ' < > you want to make xml safe") ... Note: only a member of this blog may post a comment. ... .net (4) Adventures in Minecraft (5) asp (3) c (5) c# (4) camera (7) Car (4) games (24) get_iplayer (5) gpio (13) gps (4) html (3) microbit ...

XML.com

xml.com › pub › a › 2002 › 11 › 13 › py-xml.html

Proper XML Output in Python

November 13, 2002 - Consider one of the earlier examples, with a slight change -- the word "degrees" has been replaced with the byte B0 (octal 260), which is the degree symbol in the popular ISO-8859-1 and Windows-1252 character encodings: $ python -i listing2.py >>> write_xml_cdata_log_entry(2, "In any triangle, each interior angle < 90\260") <entry level="ERROR" date="Tue Oct 22 06:33:51 2002"> In any triangle, each interior angle < 90° </entry> >>>

The Hitchhiker's Guide to Python

docs.python-guide.org › scenarios › xml

XML parsing — The Hitchhiker's Guide to Python

from xmlschema import XMLSchema, etree_tostring # load a XSD schema file schema = XMLSchema("your_schema.xsd") # validate against the schema schema.validate("your_file.xml") # or schema.is_valid("your_file.xml") # decode a file data = schmema.decode("your_file.xml") # encode to string s = etree_tostring(schema.encode(data)) This opinionated guide exists to provide both novice and expert Python developers a best practice handbook to the installation, configuration, and usage of Python on a daily basis.

Python

docs.python.org › 3 › library › xml.sax.utils.html

xml.sax.saxutils — SAX Utilities

The resulting string can be used directly as an attribute value: >>> print("<element attr=%s>" % quoteattr("ab ' cd \" ef")) <element attr="ab ' cd " ef"> This function is useful when generating attribute values for HTML or any SGML using ...

GitHub

gist.github.com › walkermatt › 440b2c1ad444e9fa7fd0aebdc6d44cad

Reading and writing a UTF-8 encoded XML file with Python ElementTree · GitHub

Writing involves writing the XML doc to UTF8 encoded bytes by calling ElementTree.tostring and specifying encoding="utf-8", then open the destination for writing in UTF8 and write the decoded bytes.

Find elsewhere

Google Bing Mojeek

Virtualzero

blog.virtualzero.tech › entries › python › xml › how-to-encode-strings-for-xml-with-python

How to Encode Strings for XML with Python | VirtualZero Blog

March 22, 2020 - Stings within XML tags need to be properly formatted. Without proper formatting, a XML Parsing Error is likely to occur. This tutorial demonstrates how to encode strings properly for XML using Python.

O'Reilly

oreilly.com › library › view › python-cookbook › 0596001673 › ch12s11.html

Autodetecting XML Encoding - Python Cookbook [Book]

July 19, 2002 - Note that encoding_name might not have an installed decoder (e.g., EBCDIC) """ # A more efficient implementation would not decode the whole # buffer at once, but then we'd have to decode a character at # a time looking for the quote character, and that's a pain encoding = "utf_8" # According to the XML spec, this is the default # This code successively tries to refine the default: # Whenever it fails to refine, it falls back to # the last place encoding ...

Authors Alex MartelliDavid Ascher

Published 2002

Pages 608

ActiveState

code.activestate.com › recipes › 303668-encoding-unicode-data-for-xml-and-html

Encoding Unicode data for XML and HTML « Python recipes « ActiveState Code

September 7, 2004 - The following code contains a trivial function "encode_for_xml" that illustrates the "xmlcharrefreplace" error handler, and a function "_xmlcharref_encode" which emulates "xmlcharrefreplace" for Python pre-2.3.

Stack Overflow

stackoverflow.com › questions › 47883390 › parse-xml-in-python-with-encoding-other-than-utf-8

Parse XML in Python with encoding other than utf-8 - Stack Overflow

Top answer

1 of 1

data is a unicode str. The error is saying that such a thing which also contains an encoding="..." declaration is not supported, because a str is supposedly already decoded from its encoding and hence it's ambiguous/nonsensical that it would also contain an encoding declaration. It's telling you to use a bytes instead, e.g. data = b'<...>'. Presumably you should be opening a file in binary mode, read the data from there and let etree handle the encoding="...", instead of using string literals in your code, which complicates the encoding situation even further.

It's as simple as:

from xml.etree import ElementTree

#        open in binary mode ↓
with open('/tmp/test.xml', 'rb') as f:
    e = ElementTree.fromstring(f.read())

Et voilà, e contains your parsed file with the encoding having been (presumably) correctly interpreted by etree based on the file's internal encoding="..." header.

ElementTree in fact has a shortcut method for this:

e = ElementTree.parse('/tmp/test.xml')

Real Python

realpython.com › python-xml-parser

A Roadmap to XML Parsers in Python – Real Python

September 25, 2023 - If you’re interested in how to encapsulate this logic in a Python class and take advantage of a generator for lazy evaluation, then expand the collapsible section below. ... import mmap import untangle class XMLTagStream: def __init__(self, path, tag_name, encoding="utf-8"): self.file = open(path) self.stream = mmap.mmap( self.file.fileno(), 0, access=mmap.ACCESS_READ ) self.tag_name = tag_name self.encoding = encoding self.start_tag = f"<{tag_name}>".encode(encoding) self.end_tag = f"</{tag_name}>".encode(encoding) def __enter__(self): return self def __exit__(self, *args, **kwargs): self.s

GitHub

gist.github.com › 3067859

Different variants of writting XML in Python · GitHub

Different variants of writting XML in Python · Raw · writexml.py · This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

W3Schools

w3schools.com › python › ref_string_encode.asp

Python String encode() Method

Python Examples Python Compiler Python Exercises Python Quiz Python Challenges Python Practice Problems Python Server Python Syllabus Python Study Plan Python Interview Q&A Python Bootcamp Python Certificate Python Training ... The encode() method encodes the string, using the specified encoding.

Stack Overflow

stackoverflow.com › questions › 59062589 › encode-and-decode-bytes-in-xml-strings

python - encode and decode bytes in xml strings - Stack Overflow

Top answer

1 of 1

Im pretty sure there is no builtins that accomplish exactly what you ask

I think the best you could do is just iterate over the characters and "fix" each one (see example that i think is complete)

try: # python2
  from htmlentitydefs import codepoint2name
except: # python3
  from html.entities import codepoint2name

def encode_xml(c):
  # return the character or its &#XX; or &entity; representation
  ascii_val = ord(c)
  known_entity =  codepoint2name.get(ascii_val,None)
  if known_entity: # this is a named codepoint
    return "&%s;"%(known_entity,)  
  # printable characters are ascii values [32..127] inclusive
  is_normal_character =  32 <= ascii_val <= 127
  if is_normal_character:
      return c
  return hex(ascii_val).replace("0x","&#")+";"


def make_xml_entity_string(s):
  return "".join(encode_xml(c) for c in s)

print("R:", make_xml_entity_string( 'abc < ABC\x14\xF2&'))

you could then go the otherway ... in roughly the same manner (taking advantage of regex this time though)

try: # python2  
  from htmlentitydefs import name2codepoint
except: # python3
  from html.entities import name2codepoint
import re

def decode_xml_replacer(match):
  name=match.group(1);
  if(name.startswith("#")):
    return chr(int(name[1:],16))
  return chr(name2codepoint.get(name,'?'))

def decode_xml_string(s):
  return re.sub("&(.*?);",decode_xml_replacer,s)

... note that this wont work for codepoints > 255 i think

Stack Overflow

stackoverflow.com › questions › 2597622 › encoding-in-xml-declaration-python

Encoding in XML declaration python - Stack Overflow

Top answer

1 of 1

>>> from xml.dom.minidom import Document
>>> a=Document()
>>> a.toprettyxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>\n'

>>> a.toxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>'

you can set the encoding for the document.writexml() function in the same way.

Stack Overflow

stackoverflow.com › questions › 47519074 › write-xml-in-python-with-pretty-print-and-encoding-declaration

Write .xml in Python with pretty print and encoding declaration - Stack Overflow

Top answer

1 of 3

The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons. It offers both a pretty_print and an xml_declaration parameter in the tostring() method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:

>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
                         pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
  <sub>
    <name>Ana</name>
    <detail/>
    <type>smart</type>
  </sub>
</main>

However, I understand your desire to use the "included batteries" only. As far as I can see, xml.etree.ElementTree has no means of changing the indentation automatically. But the minidom work-around has a solution to getting both pretty-printing and a full declaration: use the encoding parameter of the toprettyxml() method!

>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
    <sub>
        <name>Ana</name>
        <detail/>
        <type>smart</type>
    </sub>
</main>

(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb") and without further encoding.)

2 of 3

from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ", encoding='UTF-8')
with open(xmlPath, "w") as f:
    f.write(str(xmlstr.decode('UTF-8')))
    f.close()

Probably This will resolve your issue without using external libraries like lxml