Something like this?
>>> from xml.sax.saxutils import escape
>>> escape("< & >")
'< & >'
Answer from mbarkhau on Stack OverflowSomething like this?
>>> from xml.sax.saxutils import escape
>>> escape("< & >")
'< & >'
xml.sax.saxutils does not escape quotation characters (")
So here is another one:
def escape( str_xml: str ):
str_xml = str_xml.replace("&", "&")
str_xml = str_xml.replace("<", "<")
str_xml = str_xml.replace(">", ">")
str_xml = str_xml.replace("\"", """)
str_xml = str_xml.replace("'", "'")
return str_xml
if you look it up then xml.sax.saxutils only does string replace
One thing that I tried, that worked for me is to open the xml file as a file object , then use ElementTree.fromstring() passing in the complete contents of the file.
Example -
>>> import xml.etree.ElementTree as ET
>>> ef = ET.parse('a.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\xml\etree\ElementTree.py", line 1187, in parse
tree.parse(source, parser)
File "C:\Python34\lib\xml\etree\ElementTree.py", line 598, in parse
self._root = parser._parse_whole(source)
ValueError: multi-byte encodings are not supported
>>> with open('a.xml','r') as f:
... ef = ET.fromstring(f.read())
...
>>> ef
<Element 'productMeta' at 0x028DF180>
You can also, create an XMLParser with the required encoding, and this should enable you to be able to parse strings from that encoding, Example -
import xml.etree.ElementTree as ET
xmlp = ET.XMLParser(encoding="utf-8")
f = ET.parse('a.xml',parser=xmlp)
ET.parse('a.xml', parser=ET.XMLParser(encoding='iso-8859-5'))
solved my problem when dealed with xml excel in python
The most elegant solution is certainly using the third-party library lxml, which is being used a lot – for good reasons.
It offers both a pretty_print and an xml_declaration parameter in the tostring() method, so you get both. And the API is quite close to that of the std-lib ElementTree, which you seem to be using now. Here's an example:
>>> from lxml import etree
>>> doc = etree.parse(xmlPath)
>>> print etree.tostring(doc, encoding='UTF-8', xml_declaration=True,
pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
However, I understand your desire to use the "included batteries" only.
As far as I can see, xml.etree.ElementTree has no means of changing the indentation automatically.
But the minidom work-around has a solution to getting both pretty-printing and a full declaration: use the encoding parameter of the toprettyxml() method!
>>> doc = minidom.parseString(ET.tostring(root))
>>> print doc.toprettyxml(encoding='utf8')
<?xml version="1.0" encoding="utf8"?>
<main>
<sub>
<name>Ana</name>
<detail/>
<type>smart</type>
</sub>
</main>
(Be aware that the returned string is already encoded and that you should write it to a file opened in binary mode ("wb") and without further encoding.)
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ", encoding='UTF-8')
with open(xmlPath, "w") as f:
f.write(str(xmlstr.decode('UTF-8')))
f.close()
Probably This will resolve your issue without using external libraries like lxml