Brave Search

stackoverflow.com › questions › 1546717 › escaping-strings-for-use-in-xml

python - Escaping strings for use in XML - Stack Overflow

1 of 8

Something like this?

>>> from xml.sax.saxutils import escape
>>> escape("< & >")   
'&lt; &amp; &gt;'

2 of 8

xml.sax.saxutils does not escape quotation characters (")

So here is another one:

def escape( str_xml: str ):
    str_xml = str_xml.replace("&", "&amp;")
    str_xml = str_xml.replace("<", "&lt;")
    str_xml = str_xml.replace(">", "&gt;")
    str_xml = str_xml.replace("\"", "&quot;")
    str_xml = str_xml.replace("'", "&apos;")
    return str_xml

if you look it up then xml.sax.saxutils only does string replace

wiki.python.org › moin › EscapingXml

Escaping XML - Python Wiki

November 15, 2008 - If we want character references to be considered, we can use the Expat XML parser included with all recent versions of Python. This function will do the trick: 1 import xml.parsers.expat 2 3 def unescape(s): 4 want_unicode = False 5 if isinstance(s, unicode): 6 s = s.encode("utf-8") 7 want_unicode = True 8 9 # the rest of this assumes that `s` is UTF-8 10 list = [] 11 12 # create and initialize a parser object 13 p = xml.parsers.expat.ParserCreate("utf-8") 14 p.buffer_text = True 15 p.returns_unicode = want_unicode 16 p.CharacterDataHandler = list.append 17 18 # parse the data wrapped in a dummy element 19 # (needed so the "document" is well-formed) 20 p.Parse("<e>", 0) 21 p.Parse(s, 0) 22 p.Parse("</e>", 1) 23 24 # join the extracted strings and return 25 es = "" 26 if want_unicode: 27 es = u"" 28 return es.join(list)

docs.python.org › 3 › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API

January 29, 2026 - Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding [1] is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring ...

stackoverflow.com › questions › 31390213 › how-to-parse-an-xml-file-with-encoding-declaration-in-python

How to parse an XML file with encoding declaration in Python? - Stack Overflow

stuffaboutcode.com › 2012 › 06 › python-encode-xml-escape-characters.html

1 of 2

One thing that I tried, that worked for me is to open the xml file as a file object , then use ElementTree.fromstring() passing in the complete contents of the file.

Example -

>>> import xml.etree.ElementTree as ET
>>> ef = ET.parse('a.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 1187, in parse
    tree.parse(source, parser)
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
ValueError: multi-byte encodings are not supported
>>> with open('a.xml','r') as f:
...     ef = ET.fromstring(f.read())
...
>>> ef
<Element 'productMeta' at 0x028DF180>

You can also, create an XMLParser with the required encoding, and this should enable you to be able to parse strings from that encoding, Example -

import xml.etree.ElementTree as ET
xmlp = ET.XMLParser(encoding="utf-8")
f = ET.parse('a.xml',parser=xmlp)

2 of 2

 ET.parse('a.xml', parser=ET.XMLParser(encoding='iso-8859-5'))

solved my problem when dealed with xml excel in python

Stuffaboutcode

<Stuff about="code" />: Python - encode XML escape characters

# encode xml escape characters def encodeXMLText(text): text = text.replace("&", "&") text = text.replace("\"", """) text = text.replace("'", "'") text = text.replace("<", "<") text = text.replace(">", ">") return text

docs.python.org › 3 › library › xml.sax.utils.html

xml.sax.saxutils — SAX Utilities

In other words, using an XMLGenerator as the content handler will reproduce the original document being parsed. out should be a file-like object which will default to sys.stdout. encoding is the encoding of the output stream which defaults to 'iso-8859-1'. short_empty_elements controls the formatting of elements that contain no content: if False (the default) they are emitted as a pair of start/end tags, if set to True they are emitted as a single self-closed tag.

GitHub

gist.github.com › walkermatt › 440b2c1ad444e9fa7fd0aebdc6d44cad

Reading and writing a UTF-8 encoded XML file with Python ElementTree · GitHub

Writing involves writing the XML doc to UTF8 encoded bytes by calling ElementTree.tostring and specifying encoding="utf-8", then open the destination for writing in UTF8 and write the decoded bytes.

O'Reilly

oreilly.com › library › view › python-cookbook › 0596001673 › ch12s11.html

Autodetecting XML Encoding - Python Cookbook [Book]

July 19, 2002 - Note that encoding_name might not have an installed decoder (e.g., EBCDIC) """ # A more efficient implementation would not decode the whole # buffer at once, but then we'd have to decode a character at # a time looking for the quote character, and that's a pain encoding = "utf_8" # According to the XML spec, this is the default # This code successively tries to refine the default: # Whenever it fails to refine, it falls back to # the last place encoding ...

Authors Alex MartelliDavid Ascher

Published 2002

Pages 608

XML.com

xml.com › pub › a › 2002 › 11 › 13 › py-xml.html

Proper XML Output in Python

November 13, 2002 - Consider one of the earlier examples, with a slight change -- the word "degrees" has been replaced with the byte B0 (octal 260), which is the degree symbol in the popular ISO-8859-1 and Windows-1252 character encodings: $ python -i listing2.py >>> write_xml_cdata_log_entry(2, "In any triangle, each interior angle < 90\260") <entry level="ERROR" date="Tue Oct 22 06:33:51 2002"> In any triangle, each interior angle < 90° </entry> >>>

Find elsewhere

Google Bing Mojeek

ActiveState

code.activestate.com › recipes › 303668-encoding-unicode-data-for-xml-and-html

Encoding Unicode data for XML and HTML « Python recipes « ActiveState Code

September 7, 2004 - The following code contains a trivial function "encode_for_xml" that illustrates the "xmlcharrefreplace" error handler, and a function "_xmlcharref_encode" which emulates "xmlcharrefreplace" for Python pre-2.3.

stackoverflow.com › questions › 59062589 › encode-and-decode-bytes-in-xml-strings

python - encode and decode bytes in xml strings - Stack Overflow

1 of 1

Im pretty sure there is no builtins that accomplish exactly what you ask

I think the best you could do is just iterate over the characters and "fix" each one (see example that i think is complete)

try: # python2
  from htmlentitydefs import codepoint2name
except: # python3
  from html.entities import codepoint2name

def encode_xml(c):
  # return the character or its &#XX; or &entity; representation
  ascii_val = ord(c)
  known_entity =  codepoint2name.get(ascii_val,None)
  if known_entity: # this is a named codepoint
    return "&%s;"%(known_entity,)  
  # printable characters are ascii values [32..127] inclusive
  is_normal_character =  32 <= ascii_val <= 127
  if is_normal_character:
      return c
  return hex(ascii_val).replace("0x","&#")+";"


def make_xml_entity_string(s):
  return "".join(encode_xml(c) for c in s)

print("R:", make_xml_entity_string( 'abc < ABC\x14\xF2&'))

you could then go the otherway ... in roughly the same manner (taking advantage of regex this time though)

try: # python2  
  from htmlentitydefs import name2codepoint
except: # python3
  from html.entities import name2codepoint
import re

def decode_xml_replacer(match):
  name=match.group(1);
  if(name.startswith("#")):
    return chr(int(name[1:],16))
  return chr(name2codepoint.get(name,'?'))

def decode_xml_string(s):
  return re.sub("&(.*?);",decode_xml_replacer,s)

... note that this wont work for codepoints > 255 i think

stackoverflow.com › questions › 2597622 › encoding-in-xml-declaration-python

Encoding in XML declaration python - Stack Overflow

blog.virtualzero.tech › entries › python › xml › how-to-encode-strings-for-xml-with-python

1 of 1

>>> from xml.dom.minidom import Document
>>> a=Document()
>>> a.toprettyxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>\n'

>>> a.toxml(encoding="utf-8")
'<?xml version="1.0" encoding="utf-8"?>'

you can set the encoding for the document.writexml() function in the same way.

Virtualzero

How to Encode Strings for XML with Python | VirtualZero Blog

March 22, 2020 - Stings within XML tags need to be properly formatted. Without proper formatting, a XML Parsing Error is likely to occur. This tutorial demonstrates how to encode strings properly for XML using Python.

W3Schools

w3schools.com › python › ref_string_encode.asp

Python String encode() Method

Well organized and easy to understand Web building tutorials with lots of examples of how to use HTML, CSS, JavaScript, SQL, Python, PHP, Bootstrap, Java, XML and more.

stackoverflow.com › questions › 47883390 › parse-xml-in-python-with-encoding-other-than-utf-8

Parse XML in Python with encoding other than utf-8 - Stack Overflow

1 of 1

data is a unicode str. The error is saying that such a thing which also contains an encoding="..." declaration is not supported, because a str is supposedly already decoded from its encoding and hence it's ambiguous/nonsensical that it would also contain an encoding declaration. It's telling you to use a bytes instead, e.g. data = b'<...>'. Presumably you should be opening a file in binary mode, read the data from there and let etree handle the encoding="...", instead of using string literals in your code, which complicates the encoding situation even further.

It's as simple as:

from xml.etree import ElementTree

#        open in binary mode ↓
with open('/tmp/test.xml', 'rb') as f:
    e = ElementTree.fromstring(f.read())

Et voilà, e contains your parsed file with the encoding having been (presumably) correctly interpreted by etree based on the file's internal encoding="..." header.

ElementTree in fact has a shortcut method for this:

e = ElementTree.parse('/tmp/test.xml')

docs.python.org › 3.2 › library › xml.etree.elementtree.html

19.6. xml.etree.ElementTree — The ElementTree XML API — Python v3.2.6 documentation

October 12, 2014 - Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding [1] is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string. method is either "xml", "html" or "text" (default is "xml").

stackoverflow.com › questions › 57708 › convert-xml-html-entities-into-unicode-string-in-python

Convert XML/HTML Entities into Unicode String in Python - Stack Overflow