You can achieve what you want with the help of a small Python script (you'll need Python installed, as well as the lxml toolkit).
tagsort.py:
#!/usr/bin/python
import sys
from lxml import etree
filename, tag = sys.argv[1:]
doc = etree.parse(filename, etree.XMLParser(remove_blank_text=True))
root = doc.getroot()
root[:] = sorted(root, key=lambda el: el.findtext(tag))
print etree.tostring(doc, pretty_print=True)
This script sorts the first-level elements under the XML document root by the content of a second-level element, sending the result to stdout. It's called like this:
$ python tagsort.py filename tag
Once you've got that, you can use process substitution to get a diff based on its output (I've added one element and changed another in your example files to show a non-empty result):
$ diff <(python tagsort.py file1 Id) <(python tagsort.py file2 Id)
4a5
> <AddedTag>Something</AddedTag>
17c18
< <Role>X</Role>
---
> <Role>S</Role>
Answer from user27282 on Stack ExchangeYou can achieve what you want with the help of a small Python script (you'll need Python installed, as well as the lxml toolkit).
tagsort.py:
#!/usr/bin/python
import sys
from lxml import etree
filename, tag = sys.argv[1:]
doc = etree.parse(filename, etree.XMLParser(remove_blank_text=True))
root = doc.getroot()
root[:] = sorted(root, key=lambda el: el.findtext(tag))
print etree.tostring(doc, pretty_print=True)
This script sorts the first-level elements under the XML document root by the content of a second-level element, sending the result to stdout. It's called like this:
$ python tagsort.py filename tag
Once you've got that, you can use process substitution to get a diff based on its output (I've added one element and changed another in your example files to show a non-empty result):
$ diff <(python tagsort.py file1 Id) <(python tagsort.py file2 Id)
4a5
> <AddedTag>Something</AddedTag>
17c18
< <Role>X</Role>
---
> <Role>S</Role>
I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical xml sort then doing a diff. The following should work for you if you are on linux, mac, or if you have windows something like cygwin installed:
$ xmllint --c14n File1.xml > 1.xml
$ xmllint --c14n File2.xml > 2.xml
$ diff 1.xml 2.xml
I had a similar problem and I eventually found: http://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical XML sort then doing a diff. The following should work for you if you are on Linux, Mac, or if you have Windows with something like Cygwin installed:
$ xmllint --c14n FileA.xml > 1.xml
$ xmllint --c14n FileB.xml > 2.xml
$ diff 1.xml 2.xml
For what it's worth, I have created a java tool (or kotlin actually) for effecient and configurable canonicalization of xml files.
It will always:
- Sort nodes and attributes by name.
- Remove namespaces (yes - it could - hypothetically - be a problem).
- Prettyprint the result.
In addition you can tell it to:
- Remove a given list of node names - maybe you do not want to know that the value of a piece of metadata - say
<RequestReceivedTimestamp>has changed. - Sort a given list of collections in the context of the parent - maybe you do not care that the order of
<Contact>entries in<ListOfFavourites>has changed.
It uses XSLT and does all the above efficiently using chaining.
Limitations
It does support sorting nested lists - sorting innermost lists before outer. But it cannot reliably sort arbitrary levels of recursively nested lists.
If you have such needs you can - after having used this tool - compare the sorted byte arrays of the results. they will be equal if only list sorting issues remain.
Where to get it
You can get it here: XMLNormalize
One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.
$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Or as a one-liner.
$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".
$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml
$ xmllint | grep c14
--c14n : save in W3C canonical format v1.0 (with comments)
--c14n11 : save in W3C canonical format v1.1 (with comments)
--exc-c14n : save in W3C exclusive canonical format (with comments)
$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686
$ cat /etc/system-release
CentOS release 6.5 (Final)
Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.
I've got a few XML files that are a couple thousand lines long I need to compare and see which values in file A are missing from file B.
Does anyone have an efficient way to diff these? They're not formatted the same at all, so normal text diff programs won't work. I just need to compare which "<value203>...</value203>" aren't in the one file.
Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.
A commercial tool that specializes in this problem is DeltaXML.
If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.
Good answer here:
Question: How can I diff two XML files? | Super User
Answer: How can I diff two XML files? | Super User
$ xmllint --format --exc-c14n one.xml > 1.xml
$ xmllint --format --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.
Hello, is there a free tool for xml files comparison? I need to compare two files to check out is there any differences in the new one.