If you want two objects with the same elements but in a different order to compare equal, then the obvious thing to do is compare sorted copies of them - for instance, for the dictionaries represented by your JSON strings a and b:
import json
a = json.loads("""
{
"errors": [
{"error": "invalid", "field": "email"},
{"error": "required", "field": "name"}
],
"success": false
}
""")
b = json.loads("""
{
"success": false,
"errors": [
{"error": "required", "field": "name"},
{"error": "invalid", "field": "email"}
]
}
""")
>>> sorted(a.items()) == sorted(b.items())
False
... but that doesn't work, because in each case, the "errors" item of the top-level dict is a list with the same elements in a different order, and sorted() doesn't try to sort anything except the "top" level of an iterable.
To fix that, we can define an ordered function which will recursively sort any lists it finds (and convert dictionaries to lists of (key, value) pairs so that they're orderable):
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
If we apply this function to a and b, the results compare equal:
>>> ordered(a) == ordered(b)
True
Answer from Zero Piraeus on Stack OverflowIf you want two objects with the same elements but in a different order to compare equal, then the obvious thing to do is compare sorted copies of them - for instance, for the dictionaries represented by your JSON strings a and b:
import json
a = json.loads("""
{
"errors": [
{"error": "invalid", "field": "email"},
{"error": "required", "field": "name"}
],
"success": false
}
""")
b = json.loads("""
{
"success": false,
"errors": [
{"error": "required", "field": "name"},
{"error": "invalid", "field": "email"}
]
}
""")
>>> sorted(a.items()) == sorted(b.items())
False
... but that doesn't work, because in each case, the "errors" item of the top-level dict is a list with the same elements in a different order, and sorted() doesn't try to sort anything except the "top" level of an iterable.
To fix that, we can define an ordered function which will recursively sort any lists it finds (and convert dictionaries to lists of (key, value) pairs so that they're orderable):
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
If we apply this function to a and b, the results compare equal:
>>> ordered(a) == ordered(b)
True
Another way could be to use json.dumps(X, sort_keys=True) option:
import json
a, b = json.dumps(a, sort_keys=True), json.dumps(b, sort_keys=True)
a == b # a normal string comparison
This works for nested dictionaries and lists.
I need to compare two JSON files that contain a list of dictionaries whose basic format is:
[{"protocol": "S", "type": "", "network": "0.0.0.0", "mask": "0", "distance": "254", "metric": "0", "nexthop_ip": "192.168.122.1", "nexthop_if": "", "uptime": ""}, {"protocol": "O", "type": "", "network": "10.129.30.0", "mask": "24", "distance": "110", "metric": "2", "nexthop_ip": "172.20.10.1", "nexthop_if": "GigabitEthernet0/1", "uptime": "08:58:25"}]Though there are many, many more dictionary items in the list than that shown above. I am not quite sure how best to go about comparing the files to spot differences and return or save those difference to another file, preferably in JSON format, though a CSV would be fine too.
The one gotcha that there may be is I need to exclude, at a minimum, the uptime value as it is constantly changing so it will of course trigger anything looking for changes. Can anyone help get me started please?
Is there a way to compare two JSON files/json body?
how to compare two json file line by line using python?
python - Compare two JSON Files and Return the Difference - Stack Overflow
Diff two large JSON array or objects
Videos
» pip install json-files-compare
» pip install json-diff
» pip install jsoncomparison
with open("file1.json", "r") as f1:
file1 = json.loads(f1.read())
with open("file2.json", "r") as f2:
file2 = json.loads(f2.read())
for item in file2:
if item not in file1:
print(f"Found difference: {item}")
file1.append(item)
print(f"New file1: {file1}")
file1=[
{
"name": "John Wood",
"age": 35,
"country": "USA"
},
{
"name": "Mark Smith",
"age": 30,
"country": "USA"
}
]
file2=[
{
"name": "John Wood",
"age": 35,
"country": "USA"
},
{
"name": "Mark Smith",
"age": 30,
"country": "USA"
},
{
"name": "Oscar Bernard",
"age": 25,
"country": "Australia"
}
]
for item in file2:
if item['name'] not in [x['name'] for x in file1]:
print(f"Found difference: {item}")
file1.append(item)
print(f"New file1: {file1}")
I have a Python lambda function downloading a large excel file and converting it to JSON.
This file will be downloaded at least once a day (as the data can change)
I need to push the changed/updated data to an API.
Is there a way for me to compare two JSON files and output the diff?
It would be perfect if it would output multiple arrays of objects.
1 array of objects that have changed (I don’t care what has changed, just need to know that it has)
1 array of removed/deleted objects.
Two possible ways :
- Using the technique mentioned in the comment posted by Josh.
- Using the technique mentioned here : how to compare 2 json in python.
Given that you have a large file, you are better off using difflib technique described in point 1.
Edit based on response to my below answer:
After some research, it appears that the best way to deal with large data payloads is to process this payload in a streamed manner. This way we ensure a speedy processing of the data keeping in mind the memory usage and performance of the software in general.
Refer to this link that talks about Streaming JSON data objects using Python. Similarly take a look at ijson - this is an iterator based JSON parsing/processing library in python.
Hopefully, this helps you towards identifying a good fit library that will solve your use case
This seems to be a pretty solid start: https://github.com/ZoomerAnalytics/jsondiff
>>> pip install jsondiff
>>> from jsondiff import diff
>>> diff({'a': 1, 'b': 2}, {'b': 3, 'c': 4}, syntax='symmetric')
{insert: {'c': 4}, 'b': [2, 3], delete: {'a': 1}}
I'm also going to try it out for a current project, I'll try to maintain updates and edits as I go along.