I wrote a simple python tool for this called xmldiffs:
Compare two XML files, ignoring element and attribute order.
Usage:
xmldiffs [OPTION] FILE1 FILE2Any extra options are passed to the
diffcommand.
Get it at https://github.com/joh/xmldiffs
Answer from joh on Stack OverflowI wrote a simple python tool for this called xmldiffs:
Compare two XML files, ignoring element and attribute order.
Usage:
xmldiffs [OPTION] FILE1 FILE2Any extra options are passed to the
diffcommand.
Get it at https://github.com/joh/xmldiffs
With Beyond Compare you can use in the File Formats-Settings the XML Sort Conversion. With this option the XML children will be sorted before the diff.
A trial / portable version of Beyond Compare is available.
For xmlunit 2.0 (I was looking for this) it is now done, by using DefaultNodeMatcher
Diff diff = Diffbuilder.compare(Input.fromFile(control))
.withTest(Input.fromFile(test))
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
.build()
Hope this helps this helps other people googling...
My original answer is outdated. If I would have to build it again i would use xmlunit 2 and xmlunit-matchers. Please note that for xml unit a different order is always 'similar' not equals.
@Test
public void testXmlUnit() {
String myControlXML = "<test><elem>a</elem><elem>b</elem></test>";
String expected = "<test><elem>b</elem><elem>a</elem></test>";
assertThat(myControlXML, isSimilarTo(expected)
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText)));
//In case you wan't to ignore whitespaces add ignoreWhitespace().normalizeWhitespace()
assertThat(myControlXML, isSimilarTo(expected)
.ignoreWhitespace()
.normalizeWhitespace()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText)));
}
If somebody still want't to use a pure java implementation here it is. This implementation extracts the content from xml and compares the list ignoring order.
public static Document loadXMLFromString(String xml) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
return builder.parse(is);
}
@Test
public void test() throws Exception {
Document doc = loadXMLFromString("<test>\n" +
" <elem>b</elem>\n" +
" <elem>a</elem>\n" +
"</test>");
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//test//elem");
NodeList all = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
List<String> values = new ArrayList<>();
if (all != null && all.getLength() > 0) {
for (int i = 0; i < all.getLength(); i++) {
values.add(all.item(i).getTextContent());
}
}
Set<String> expected = new HashSet<>(Arrays.asList("a", "b"));
assertThat("List equality without order",
values, containsInAnyOrder(expected.toArray()));
}
I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical xml sort then doing a diff. Being that you are on linux, this should work for you cleanly. It worked for me on my mac, and should work for people on windows if they have something like cygwin installed:
$ xmllint --c14n a.xml > sortedA.xml
$ xmllint --c14n b.xml > sortedB.xml
$ diff sortedA.xml sortedB.xml
You're requesting a sort based on the sequence of attributes in the elements being sorted. But your top-level tag elements here have only one attribute: name. If you want multiple tag elements with name="BBB" to sort differently, you need to give them distinct sort keys.
In your example, I'd try something like select="concat(name(), @name, name(*[1]), *[1]/@name)" -- but this is a very shallow key. It uses values from the first child in the input, but the children may shift position during the process. You may be able (knowing your data better than I do) to calculate a good key for each element in a single pass, or you may just need several passes.
Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.
A commercial tool that specializes in this problem is DeltaXML.
If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.
Good answer here:
Question: How can I diff two XML files? | Super User
Answer: How can I diff two XML files? | Super User
$ xmllint --format --exc-c14n one.xml > 1.xml
$ xmllint --format --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.
One difference that may need to become clearer in the 2.x documentation is the default ElementSelector - roughly what used to be ElementQualifier in 1.x. Where 1.x defaults to match elements by name, 2.x defaults to match elements in order. Maybe this is a bad idea.
Your Diff should work if you switch to matching on element names.
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName))
You might need to add something along the lines of
.withDifferenceEvaluator(((comparison, outcome) -> {
if (outcome == ComparisonResult.DIFFERENT &&
comparison.getType() == ComparisonType.CHILD_NODELIST_SEQUENCE) {
return ComparisonResult.EQUAL;
}
return outcome;
})).build();
to your Diff builder
for me the solution mentioned above will not work as compareNodeLists has this hardcoded in DOMDifferenceEngine.compareNodes()
new Comparison(ComparisonType.CHILD_NODELIST_SEQUENCE...
i have raised a new ticket for this though bear in mind it could just be my lack of understading :-)
https://github.com/xmlunit/xmlunit/issues/258
One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.
$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml
Or as a one-liner.
$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".
$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml
$ xmllint | grep c14
--c14n : save in W3C canonical format v1.0 (with comments)
--c14n11 : save in W3C canonical format v1.1 (with comments)
--exc-c14n : save in W3C exclusive canonical format (with comments)
$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686
$ cat /etc/system-release
CentOS release 6.5 (Final)
Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.
I had a similar problem and I eventually found: http://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical XML sort then doing a diff. The following should work for you if you are on Linux, Mac, or if you have Windows with something like Cygwin installed:
$ xmllint --c14n FileA.xml > 1.xml
$ xmllint --c14n FileB.xml > 2.xml
$ diff 1.xml 2.xml
For what it's worth, I have created a java tool (or kotlin actually) for effecient and configurable canonicalization of xml files.
It will always:
- Sort nodes and attributes by name.
- Remove namespaces (yes - it could - hypothetically - be a problem).
- Prettyprint the result.
In addition you can tell it to:
- Remove a given list of node names - maybe you do not want to know that the value of a piece of metadata - say
<RequestReceivedTimestamp>has changed. - Sort a given list of collections in the context of the parent - maybe you do not care that the order of
<Contact>entries in<ListOfFavourites>has changed.
It uses XSLT and does all the above efficiently using chaining.
Limitations
It does support sorting nested lists - sorting innermost lists before outer. But it cannot reliably sort arbitrary levels of recursively nested lists.
If you have such needs you can - after having used this tool - compare the sorted byte arrays of the results. they will be equal if only list sorting issues remain.
Where to get it
You can get it here: XMLNormalize