If you want two objects with the same elements but in a different order to compare equal, then the obvious thing to do is compare sorted copies of them - for instance, for the dictionaries represented by your JSON strings a and b:
import json
a = json.loads("""
{
"errors": [
{"error": "invalid", "field": "email"},
{"error": "required", "field": "name"}
],
"success": false
}
""")
b = json.loads("""
{
"success": false,
"errors": [
{"error": "required", "field": "name"},
{"error": "invalid", "field": "email"}
]
}
""")
>>> sorted(a.items()) == sorted(b.items())
False
... but that doesn't work, because in each case, the "errors" item of the top-level dict is a list with the same elements in a different order, and sorted() doesn't try to sort anything except the "top" level of an iterable.
To fix that, we can define an ordered function which will recursively sort any lists it finds (and convert dictionaries to lists of (key, value) pairs so that they're orderable):
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
If we apply this function to a and b, the results compare equal:
>>> ordered(a) == ordered(b)
True
Answer from Zero Piraeus on Stack OverflowIf you want two objects with the same elements but in a different order to compare equal, then the obvious thing to do is compare sorted copies of them - for instance, for the dictionaries represented by your JSON strings a and b:
import json
a = json.loads("""
{
"errors": [
{"error": "invalid", "field": "email"},
{"error": "required", "field": "name"}
],
"success": false
}
""")
b = json.loads("""
{
"success": false,
"errors": [
{"error": "required", "field": "name"},
{"error": "invalid", "field": "email"}
]
}
""")
>>> sorted(a.items()) == sorted(b.items())
False
... but that doesn't work, because in each case, the "errors" item of the top-level dict is a list with the same elements in a different order, and sorted() doesn't try to sort anything except the "top" level of an iterable.
To fix that, we can define an ordered function which will recursively sort any lists it finds (and convert dictionaries to lists of (key, value) pairs so that they're orderable):
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
If we apply this function to a and b, the results compare equal:
>>> ordered(a) == ordered(b)
True
Another way could be to use json.dumps(X, sort_keys=True) option:
import json
a, b = json.dumps(a, sort_keys=True), json.dumps(b, sort_keys=True)
a == b # a normal string comparison
This works for nested dictionaries and lists.
I need to compare two JSON files that contain a list of dictionaries whose basic format is:
[{"protocol": "S", "type": "", "network": "0.0.0.0", "mask": "0", "distance": "254", "metric": "0", "nexthop_ip": "192.168.122.1", "nexthop_if": "", "uptime": ""}, {"protocol": "O", "type": "", "network": "10.129.30.0", "mask": "24", "distance": "110", "metric": "2", "nexthop_ip": "172.20.10.1", "nexthop_if": "GigabitEthernet0/1", "uptime": "08:58:25"}]Though there are many, many more dictionary items in the list than that shown above. I am not quite sure how best to go about comparing the files to spot differences and return or save those difference to another file, preferably in JSON format, though a CSV would be fine too.
The one gotcha that there may be is I need to exclude, at a minimum, the uptime value as it is constantly changing so it will of course trigger anything looking for changes. Can anyone help get me started please?
Videos
ยป pip install json-files-compare
ยป pip install json-diff
ยป pip install jsoncomparison
Hi all,
I'm trying to write my first python script and I can't work out the best way to diff two JSON files and write the difference to a third.
I've managed to get the script to find differences, but only in entire lines. I want it to either treat it as a JSON, or compare any and all changes, regardless of the line.
Here it is:
a = open('markets.json', 'r').read().split('\n')
b = open('updatemarkets.json', 'r').read().split('\n')
c = open('newpairs.json', 'w')
c.write('\n'.join([comm for comm in b if not (comm in a)]))
c.close()Obviously the above just duplicates the entire file if any changes are found (as the data is treated as one line).
edit: formatting
edit: this is the data for reference: https://bittrex.com/api/v1.1/public/getmarkets
edit: Difflib doesn't allow you to save the output as it's original format, or create a "patch" like file (in the style of bash's diff/patch). Still looking for a solution!
edit: Maybe a better question is, is it possible to separate JSON objects with a new line at the time of import (json.dump)?
edit: Found how to import the data with a new line using json.dump(data, f, indent=4), but still can't find a way to output the difference between the files.
According to the documentation:
Each line of a Differ delta begins with a two-letter code:
| Code | Meaning |
|------|-------------------------------------------|
| '- ' | line unique to sequence 1 |
| '+ ' | line unique to sequence 2 |
| ' ' | line common to both sequences |
| '? ' | line not present in either input sequence |
So basically, all you have to do is filter lines starting with either "- " or "+ ".
result = diff.compare(target_file.readlines(), orig_file.readlines())
result = [line for line in result if line.startswith(("- ", "+ "))]
You could apply julienc's answer - which is "technically correct" given your question's formulation - but doing a text diff on json strings is broken - there are many different valid json representations of the same data due to non-significant whitespaces / indentation and to the fact that json objects are defined as unordered collections (so {"foo":42, "bar":true} and {"bar": true, "foo":42} are textually different yet are strictly equals once unserialized.
IOW, the working solution is to deserialize your json data and do a proper dict comparison.
I have a Python lambda function downloading a large excel file and converting it to JSON.
This file will be downloaded at least once a day (as the data can change)
I need to push the changed/updated data to an API.
Is there a way for me to compare two JSON files and output the diff?
It would be perfect if it would output multiple arrays of objects.
1 array of objects that have changed (I donโt care what has changed, just need to know that it has)
1 array of removed/deleted objects.