One way to achieve this is to use XSLT Transformation. Most programming languages including Python will have support to convert an XML document into another document (e.g. HTML) when supplied with an XSL.
A good tutorial on XSLT Transformation can be found here
Use of Python to achieve transformation (once an XSL is prepared) is described here
Answer from Gro on Stack OverflowOne way to achieve this is to use XSLT Transformation. Most programming languages including Python will have support to convert an XML document into another document (e.g. HTML) when supplied with an XSL.
A good tutorial on XSLT Transformation can be found here
Use of Python to achieve transformation (once an XSL is prepared) is described here
There are several things wrong with your XHTML source. First, xmlns is not a correct attribute for the xml declaration; it should be put on the root element instead. And the root element for XHTML is <html>, not <xhtml>. So the valid XHTML input in this particular case would be
<?xml version=\"1.0\"?>\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n<head><title></title></head>\n<body>\n</body></html>
That said, I'm not sure if xml.etree.ElementTree accepts that, having no experience with it.
Parsing HTML with the XML module
Converting .txt with RGB values to .xml or .qml?
Pandas dataframe to nested xml
How to parse a XML file into a kotlin Data Class
Videos
BeautifulSoup gets you almost all the way there:
>>> import BeautifulSoup
>>> f = open('a.html')
>>> soup = BeautifulSoup.BeautifulSoup(f)
>>> f.close()
>>> g = open('a.xml', 'w')
>>> print >> g, soup.prettify()
>>> g.close()
This closes all tags properly. The only issue remaining is that the doctype remains HTML -- to change that into the doctype of your choice, you only need to change the first line, which is not hard, e.g., instead of printing the prettified text directly,
>>> lines = soup.prettify().splitlines()
>>> lines[0] = ('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"'
'"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">')
>>> print >> g, '\n'.join(lines)
lxml works well:
from lxml import html, etree
doc = html.fromstring(open('a.html').read())
out = open('a.xhtml', 'wb')
out.write(etree.tostring(doc))
» pip install html2py