These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.
The available options for that are:
- ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
- cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5. Deprecated and folded into the regular ElementTree as an automatic thing as of 3.3.)
- LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)
Here's an example of how to generate your example document using the in-stdlib cElementTree:
import xml.etree.cElementTree as ET
root = ET.Element("root")
doc = ET.SubElement(root, "doc")
ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"
tree = ET.ElementTree(root)
tree.write("filename.xml")
I've tested it and it works, but I'm assuming whitespace isn't significant. If you need "prettyprint" indentation, let me know and I'll look up how to do that. (It may be an LXML-specific option. I don't use the stdlib implementation much)
For further reading, here are some useful links:
- API docs for the implementation in the Python standard library
- Introductory Tutorial (From the original author's site)
- LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)
As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you're in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:
- LXML clearly wins for serializing (generating) XML
- As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.
These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.
The available options for that are:
- ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
- cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5. Deprecated and folded into the regular ElementTree as an automatic thing as of 3.3.)
- LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)
Here's an example of how to generate your example document using the in-stdlib cElementTree:
import xml.etree.cElementTree as ET
root = ET.Element("root")
doc = ET.SubElement(root, "doc")
ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"
tree = ET.ElementTree(root)
tree.write("filename.xml")
I've tested it and it works, but I'm assuming whitespace isn't significant. If you need "prettyprint" indentation, let me know and I'll look up how to do that. (It may be an LXML-specific option. I don't use the stdlib implementation much)
For further reading, here are some useful links:
- API docs for the implementation in the Python standard library
- Introductory Tutorial (From the original author's site)
- LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)
As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you're in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:
- LXML clearly wins for serializing (generating) XML
- As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.
The lxml library includes a very convenient syntax for XML generation, called the E-factory. Here's how I'd make the example you give:
#!/usr/bin/python
import lxml.etree
import lxml.builder
E = lxml.builder.ElementMaker()
ROOT = E.root
DOC = E.doc
FIELD1 = E.field1
FIELD2 = E.field2
the_doc = ROOT(
DOC(
FIELD1('some value1', name='blah'),
FIELD2('some value2', name='asdfasd'),
)
)
print lxml.etree.tostring(the_doc, pretty_print=True)
Output:
<root>
<doc>
<field1 name="blah">some value1</field1>
<field2 name="asdfasd">some value2</field2>
</doc>
</root>
It also supports adding to an already-made node, e.g. after the above you could say
the_doc.append(FIELD2('another value again', name='hithere'))
Videos
The last few days I've been thinking and searching ways about how to generate XML in a nice way.
I have an XSD which extends the GraphML specification in order to draw more sophisticated diagrams (e.g. UML). The Structure is something like this (I replaced attribute values with python format placeholders)
<xs:Node>
<xs:Geometry height="{height}" width="{width}" x="{x}" y="{y}"/>
<xs:Fill color="{color}"/>
<xs:NodeLabel ...>{node_label}</y:NodeLabel>
</xs:Node>Current idea/concept
Classes which represent a node and act as a mere data container.
class XMLBase(object):
def __repr__(self):
return str(self.__dict__)
def __str__(self):
return self.xml.format(**self.__dict__)
class Geometry(XMLBase):
def __init__(self, height=28.0, width=100.0, x=0.0, y=0.0):
self.xml = '<xs:Geometry height="{height}" width="{width}" x="{x}" y="{y}"/>'
self.height = height
self.width = width
self.x = x
self.y = y
class Fill(XMLBase):
...The __str__ method inserts the attributes into the XML string and returns it. The class for the Node would contain an instance of Geometry, Fill, NodeLabel and others, and generate the respective XML representation. The composition of classes is actually the XML data represented by python objects. Other than using classes, one could use lists or dictionaries, but classes seem nicer to use.
A previous idea was to have a big template with placeholders and insert the values with format, but this approach is less flexible and I'd say more difficult to understand (and maintain).
Other ideas were to store the XML templates in variables and let functions generate the correct XML structure.
Now I'd like to know your thoughts about these approaches, different approaches, or maybe someone knows best practices.