To summarise, addressing your requirements:
- Comparison tool that can compare XSD files: Yes
- Can identify changes in structure, not just content of XSD files: Yes
- Can ignore expected changes: Yes
- Orderless comparisons; avoiding false positives from moved elements: Yes
- Compare all files at once: Yes, through batch processing
- Produce a report with the details: Unclear what is needed here
XML Compare, developed by DeltaXML, is a structurally aware XML comparison tool that is able to identify change in both the structure and the content of XML files, including XSD.
In instances where elements have moved in the file but the order of said elements is not significant to the comparison, XML Compare has built in functionality for Comparing Orderless Elements. This can be customised extensively using XSLT as needed.
When you are expecting changes that do not need to be identified as change, you can use XML Compare's built in Ignoring Changes functionality which can again be customised using XSLT.
For bulk processing of your files, although XML Compare only compares files 1-to-1, it can be run via the command line, REST API and Java API. This means that by writing a batch file to queue up comparison operations, or if you use the REST API the comparison operations will be queued automatically.
Finally, to address your need for a report summarising the results, this depends on what you mean. XML Compare is capable of producing HTML reports from diff outputs that highlight changes in-line or side-by-side. It sounds however like you may be looking for a more statistical report on the results of the comparisons, for which XML Compare does not currently have a facility. It is worth noting however that the diff outputs produced by XML Compare are valid XML, and can be readily processed to identify the change attributes in the file.
Disclosure: I am an employee of DeltaXML.
Answer from OliverXML on Stack ExchangeTo summarise, addressing your requirements:
- Comparison tool that can compare XSD files: Yes
- Can identify changes in structure, not just content of XSD files: Yes
- Can ignore expected changes: Yes
- Orderless comparisons; avoiding false positives from moved elements: Yes
- Compare all files at once: Yes, through batch processing
- Produce a report with the details: Unclear what is needed here
XML Compare, developed by DeltaXML, is a structurally aware XML comparison tool that is able to identify change in both the structure and the content of XML files, including XSD.
In instances where elements have moved in the file but the order of said elements is not significant to the comparison, XML Compare has built in functionality for Comparing Orderless Elements. This can be customised extensively using XSLT as needed.
When you are expecting changes that do not need to be identified as change, you can use XML Compare's built in Ignoring Changes functionality which can again be customised using XSLT.
For bulk processing of your files, although XML Compare only compares files 1-to-1, it can be run via the command line, REST API and Java API. This means that by writing a batch file to queue up comparison operations, or if you use the REST API the comparison operations will be queued automatically.
Finally, to address your need for a report summarising the results, this depends on what you mean. XML Compare is capable of producing HTML reports from diff outputs that highlight changes in-line or side-by-side. It sounds however like you may be looking for a more statistical report on the results of the comparisons, for which XML Compare does not currently have a facility. It is worth noting however that the diff outputs produced by XML Compare are valid XML, and can be readily processed to identify the change attributes in the file.
Disclosure: I am an employee of DeltaXML.
That's quite a challenge, because so many differences are possible: you may have to make compromises depending on what you actually encounter. (For example, you may or may not need to bother with named model groups, depending on whether either of the schemas you are comparing actually uses them. Most schemas don't.)
If I were doing this (and of course I'm biased towards using my own tools) I would start by using Saxon's schema validator to generate SCM (schema component model) files for both schemas. This will do a fair bit of normalization, e.g. handling the difference between inline types and references to named global types. I would then write an XSLT stylesheet to do further normalization on the SCM files, for example sorting components into a canonical order, sorting enumeration values, and so on; also, eliminating the parts of the SCM files that aren't relevant, such as finite state machine details. I would then probably write a custom XSLT comparison module to compare the two normalized SCM files (along the lines of https://dvcs.w3.org/hg/xslt30-test/file/tip/runner/compare.xsl which is used for comparing XSLT test results) - the key being that you don't just want a boolean answer saying whether they are the same or different, you want to highlight the differences. Alternatively you could use fn:deep-equal to test whether they are the same, and then using a visual diff tool to examine them side-by-side if not.
My approach to this was to canonicalize the representation of the XML Schema.
Unfortunately, I can also tell you that, unlike canonicalization of XML documents (used, as an example, to calculate a digital signature), it is not that simple or even standardized.
So basically, you have to transform both XML Schemas to a "canonical form" - whatever the tool you build or use thinks that form is, and then do the compare.
My approach was to create an XML Schema set (could be more than one file if you have more namespaces) for each root element I needed, since I found it easier to compare XSDs authored using the Russian Doll style, starting from the PSVI model.
I then used options such as auto matching substitution group members coupled with replacement of substitution groups with a choice; removal of "superfluous" XML Schema sequences, collapsing of single option choices or moving minOccurs/maxOccurs around for single item compositors, etc.
Depending on what your XSD-aware comparison tool's features are, or you settle to build, you might also have to rearrange particles under compositors such as xsd:choice or xsd:all; etc.
Anyway, what I learned after all of it was that it is extremely hard to build a tool that would work nice for all "cool" XSD features out there... One test case I remember fondly was to deal with various xsd:any content.
I do wonder though if things have changed since...
Membrane SOA Model - Java API for WSDL and XML Schema
package sample.schema;
import java.util.List;
import com.predic8.schema.Schema;
import com.predic8.schema.SchemaParser;
import com.predic8.schema.diff.SchemaDiffGenerator;
import com.predic8.soamodel.Difference;
public class CompareSchema {
public static void main(String[] args) {
compare();
}
private static void compare(){
SchemaParser parser = new SchemaParser();
Schema schema1 = parser.parse("resources/diff/1/common.xsd");
Schema schema2 = parser.parse("resources/diff/2/common.xsd");
SchemaDiffGenerator diffGen = new SchemaDiffGenerator(schema1, schema2);
List<Difference> lst = diffGen.compare();
for (Difference diff : lst) {
dumpDiff(diff, "");
}
}
private static void dumpDiff(Difference diff, String level) {
System.out.println(level + diff.getDescription());
for (Difference localDiff : diff.getDiffs()){
dumpDiff(localDiff, level + " ");
}
}
}
After executing you get the output shown in listing 2. It is a List of differences between the two Schema documents.
ComplexType PersonType has changed: Sequence has changed:
Element id has changed:
The type of element id has changed from xsd:string to tns:IdentifierType.
I had a similar problem and I eventually found: http://superuser.com/questions/79920/how-can-i-diff-two-xml-files
That post suggests doing a canonical XML sort then doing a diff. The following should work for you if you are on Linux, Mac, or if you have Windows with something like Cygwin installed:
$ xmllint --c14n FileA.xml > 1.xml
$ xmllint --c14n FileB.xml > 2.xml
$ diff 1.xml 2.xml
For what it's worth, I have created a java tool (or kotlin actually) for effecient and configurable canonicalization of xml files.
It will always:
- Sort nodes and attributes by name.
- Remove namespaces (yes - it could - hypothetically - be a problem).
- Prettyprint the result.
In addition you can tell it to:
- Remove a given list of node names - maybe you do not want to know that the value of a piece of metadata - say
<RequestReceivedTimestamp>has changed. - Sort a given list of collections in the context of the parent - maybe you do not care that the order of
<Contact>entries in<ListOfFavourites>has changed.
It uses XSLT and does all the above efficiently using chaining.
Limitations
It does support sorting nested lists - sorting innermost lists before outer. But it cannot reliably sort arbitrary levels of recursively nested lists.
If you have such needs you can - after having used this tool - compare the sorted byte arrays of the results. they will be equal if only list sorting issues remain.
Where to get it
You can get it here: XMLNormalize