Videos
I've got a few XML files that are a couple thousand lines long I need to compare and see which values in file A are missing from file B.
Does anyone have an efficient way to diff these? They're not formatted the same at all, so normal text diff programs won't work. I just need to compare which "<value203>...</value203>" aren't in the one file.
I had a similar problem and I eventually found: http://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical XML sort then doing a diff. The following should work for you if you are on Linux, Mac, or if you have Windows with something like Cygwin installed:
$ xmllint --c14n FileA.xml > 1.xml
$ xmllint --c14n FileB.xml > 2.xml
$ diff 1.xml 2.xml
For what it's worth, I have created a java tool (or kotlin actually) for effecient and configurable canonicalization of xml files.
It will always:
- Sort nodes and attributes by name.
- Remove namespaces (yes - it could - hypothetically - be a problem).
- Prettyprint the result.
In addition you can tell it to:
- Remove a given list of node names - maybe you do not want to know that the value of a piece of metadata - say
<RequestReceivedTimestamp>has changed. - Sort a given list of collections in the context of the parent - maybe you do not care that the order of
<Contact>entries in<ListOfFavourites>has changed.
It uses XSLT and does all the above efficiently using chaining.
Limitations
It does support sorting nested lists - sorting innermost lists before outer. But it cannot reliably sort arbitrary levels of recursively nested lists.
If you have such needs you can - after having used this tool - compare the sorted byte arrays of the results. they will be equal if only list sorting issues remain.
Where to get it
You can get it here: XMLNormalize