One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.
$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Or as a one-liner.
$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
Answer from Jukka Matilainen on Stack ExchangeOne approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.
$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Or as a one-liner.
$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".
$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml
$ xmllint | grep c14
--c14n : save in W3C canonical format v1.0 (with comments)
--c14n11 : save in W3C canonical format v1.1 (with comments)
--exc-c14n : save in W3C exclusive canonical format (with comments)
$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686
$ cat /etc/system-release
CentOS release 6.5 (Final)
Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.
I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical xml sort then doing a diff. Being that you are on linux, this should work for you cleanly. It worked for me on my mac, and should work for people on windows if they have something like cygwin installed:
$ xmllint --c14n a.xml > sortedA.xml
$ xmllint --c14n b.xml > sortedB.xml
$ diff sortedA.xml sortedB.xml
You're requesting a sort based on the sequence of attributes in the elements being sorted. But your top-level tag elements here have only one attribute: name. If you want multiple tag elements with name="BBB" to sort differently, you need to give them distinct sort keys.
In your example, I'd try something like select="concat(name(), @name, name(*[1]), *[1]/@name)" -- but this is a very shallow key. It uses values from the first child in the input, but the children may shift position during the process. You may be able (knowing your data better than I do) to calculate a good key for each element in a single pass, or you may just need several passes.
» pip install xmldiff