xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.
Once you have that data structure, you can serialize it to JSON:
import xmltodict, json
o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'
Answer from Martin Blech on Stack Overflowxmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.
Once you have that data structure, you can serialize it to JSON:
import xmltodict, json
o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'
There is no "one-to-one" mapping between XML and JSON, so converting one to the other necessarily requires some understanding of what you want to do with the results.
That being said, Python's standard library has several modules for parsing XML (including DOM, SAX, and ElementTree). As of Python 2.6, support for converting Python data structures to and from JSON is included in the json module.
So the infrastructure is there.
Many Python XML libraries support parsing XML sub elements incrementally, e.g. xml.etree.ElementTree.iterparse and xml.sax.parse in the standard library. These functions are usually called "XML Stream Parser".
The xmltodict library you used also has a streaming mode. I think it may solve your problem
https://github.com/martinblech/xmltodict#streaming-mode
Instead of trying to read the file in one go and then process it, you want to read it in chunks and process each chunk as it's loaded. This is a fairly common situation when processing large XML files and is covered by the Simple API for XML (SAX) standard, which specifies a callback API for parsing XML streams - it's part of the Python standard library under xml.sax.parse and xml.etree.ETree as mentioned above.
Here's a quick XML to JSON converter:
from collections import defaultdict
import json
import xml.etree.ElementTree as ET
def parse_xml(file_name):
events = ("start", "end")
context = ET.iterparse(file_name, events=events)
return pt(context)
def pt(context, cur_elem=None):
items = defaultdict(list)
if cur_elem:
items.update(cur_elem.attrib)
text = ""
for action, elem in context:
# print("{0:>6} : {1:20} {2:20} '{3}'".format(action, elem.tag, elem.attrib, str(elem.text).strip()))
if action == "start":
items[elem.tag].append(pt(context, elem))
elif action == "end":
text = elem.text.strip() if elem.text else ""
elem.clear()
break
if len(items) == 0:
return text
return { k: v[0] if len(v) == 1 else v for k, v in items.items() }
if __name__ == "__main__":
json_data = parse_xml("large.xml")
print(json.dumps(json_data, indent=2))
If you're looking at a lot of XML processing check out the lxml library, it's got a ton of useful stuff over and above the standard modules, while also being much easier to use.
http://lxml.de/tutorial.html
Convert JSON to XML in Python - Stack Overflow
How to convert XML to JSON in Python? - Stack Overflow
It's 2025. What's your favorite module or method for converting xml to json and vice versa?
Convert XML to Json
Can I convert complex XML structures to JSON seamlessly?
What are the advantages of using JSON over XML?
Videos
ยป pip install xmljson
Nothing came back right away, so I went ahead and wrote a script that solves this problem.
Python already allows you to convert from JSON into a native dict (using json or, in versions < 2.6, simplejson), so I wrote a library that converts native dicts into an XML string.
https://github.com/quandyfactory/dict2xml
It supports int, float, boolean, string (and unicode), array and dict data types and arbitrary nesting (yay recursion).
If you don't have such a package, you can try:
def json2xml(json_obj, line_padding=""):
result_list = list()
json_obj_type = type(json_obj)
if json_obj_type is list:
for sub_elem in json_obj:
result_list.append(json2xml(sub_elem, line_padding))
return "\n".join(result_list)
if json_obj_type is dict:
for tag_name in json_obj:
sub_obj = json_obj[tag_name]
result_list.append("%s<%s>" % (line_padding, tag_name))
result_list.append(json2xml(sub_obj, "\t" + line_padding))
result_list.append("%s</%s>" % (line_padding, tag_name))
return "\n".join(result_list)
return "%s%s" % (line_padding, json_obj)
For example:
s='{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
j = json.loads(s)
print(json2xml(j))
Result:
<main>
<aaa>
10
</aaa>
<bbb>
1
2
3
</bbb>
</main>
xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.
Once you have that data structure, you can serialize it to JSON:
import xmltodict, json
o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'
Soviut's advice for lxml objectify is good. With a specially subclassed simplejson, you can turn an lxml objectify result into json.
import simplejson as json
import lxml
class objectJSONEncoder(json.JSONEncoder):
"""A specialized JSON encoder that can handle simple lxml objectify types
>>> from lxml import objectify
>>> obj = objectify.fromstring("<Book><price>1.50</price><author>W. Shakespeare</author></Book>")
>>> objectJSONEncoder().encode(obj)
'{"price": 1.5, "author": "W. Shakespeare"}'
"""
def default(self,o):
if isinstance(o, lxml.objectify.IntElement):
return int(o)
if isinstance(o, lxml.objectify.NumberElement) or isinstance(o, lxml.objectify.FloatElement):
return float(o)
if isinstance(o, lxml.objectify.ObjectifiedDataElement):
return str(o)
if hasattr(o, '__dict__'):
#For objects with a __dict__, return the encoding of the __dict__
return o.__dict__
return json.JSONEncoder.default(self, o)
See the docstring for example of usage, essentially you pass the result of lxml objectify to the encode method of an instance of objectJSONEncoder
Note that Koen's point is very valid here, the solution above only works for simply nested xml and doesn't include the name of root elements. This could be fixed.
I've included this class in a gist here: http://gist.github.com/345559