python xml compare ignore order

Compare XML ignoring order of child elements

stackoverflow.com › questions › 1767045 › compare-xml-ignoring-order-of-child-elements

I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

Answer from joh on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 1767045 › compare-xml-ignoring-order-of-child-elements

Compare XML ignoring order of child elements - Stack Overflow

Top answer

1 of 14

I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

2 of 14

With Beyond Compare you can use in the File Formats-Settings the XML Sort Conversion. With this option the XML children will be sorted before the diff.

A trial / portable version of Beyond Compare is available.

GitHub

gist.github.com › dalelane › a0514b2e283a882d9ef3

Comparing XML files ignoring order of attributes and elements - see http://dalelane.co.uk/blog/?p=3225 for background · GitHub

@harryyuanfeng you may have comments in your xml. You can ignore them by changing the "sortfile" function:

Server Fault

serverfault.com › questions › 430671 › utility-to-logically-compare-two-xml-files

Utility to LOGICALLY compare two xml files? - Server Fault

Top answer

1 of 5

Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.

A commercial tool that specializes in this problem is DeltaXML.

If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.

2 of 5

Good answer here:

Question: How can I diff two XML files? | Super User

Answer: How can I diff two XML files? | Super User

$ xmllint --format --exc-c14n one.xml > 1.xml
$ xmllint --format --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.

dale lane

dalelane.co.uk › blog

Comparing XML files ignoring order of attributes and child elements « dale lane

October 6, 2014 - I needed to compare some large XML files, which have big differences in the order of elements, and I couldn’t find a tool that would do the job. So I wrote a bit of Python to do it for me.

Stack Overflow

stackoverflow.com › questions › 24492895 › comparing-two-xml-files-in-python

Comparing two xml files in python - Stack Overflow

Top answer

1 of 4

This is actually a reasonably challenging problem (due to what "difference" means often being in the eye of the beholder here, as there will be semantically "equivalent" information that you probably don't want marked as differences).

You could try using xmldiff, which is based on work in the paper Change Detection in Hierarchically Structured Information.

2 of 4

My approach to the problem was transforming each XML into a xml.etree.ElementTree and iterating through each of the layers. I also included the functionality to ignore a list of attributes while doing the comparison.

The first block of code holds the class used:

import xml.etree.ElementTree as ET
import logging

class XmlTree():

    def __init__(self):
        self.hdlr = logging.FileHandler('xml-comparison.log')
        self.formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')

    @staticmethod
    def convert_string_to_tree( xmlString):

        return ET.fromstring(xmlString)

    def xml_compare(self, x1, x2, excludes=[]):
        """
        Compares two xml etrees
        :param x1: the first tree
        :param x2: the second tree
        :param excludes: list of string of attributes to exclude from comparison
        :return:
            True if both files match
        """

        if x1.tag != x2.tag:
            self.logger.debug('Tags do not match: %s and %s' % (x1.tag, x2.tag))
            return False
        for name, value in x1.attrib.items():
            if not name in excludes:
                if x2.attrib.get(name) != value:
                    self.logger.debug('Attributes do not match: %s=%r, %s=%r'
                                 % (name, value, name, x2.attrib.get(name)))
                    return False
        for name in x2.attrib.keys():
            if not name in excludes:
                if name not in x1.attrib:
                    self.logger.debug('x2 has an attribute x1 is missing: %s'
                                 % name)
                    return False
        if not self.text_compare(x1.text, x2.text):
            self.logger.debug('text: %r != %r' % (x1.text, x2.text))
            return False
        if not self.text_compare(x1.tail, x2.tail):
            self.logger.debug('tail: %r != %r' % (x1.tail, x2.tail))
            return False
        cl1 = x1.getchildren()
        cl2 = x2.getchildren()
        if len(cl1) != len(cl2):
            self.logger.debug('children length differs, %i != %i'
                         % (len(cl1), len(cl2)))
            return False
        i = 0
        for c1, c2 in zip(cl1, cl2):
            i += 1
            if not c1.tag in excludes:
                if not self.xml_compare(c1, c2, excludes):
                    self.logger.debug('children %i do not match: %s'
                                 % (i, c1.tag))
                    return False
        return True

    def text_compare(self, t1, t2):
        """
        Compare two text strings
        :param t1: text one
        :param t2: text two
        :return:
            True if a match
        """
        if not t1 and not t2:
            return True
        if t1 == '*' or t2 == '*':
            return True
        return (t1 or '').strip() == (t2 or '').strip()

The second block of code holds a couple of XML examples and their comparison:

xml1 = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"

xml2 = "<note><to>Tove</to><from>Daniel</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"

tree1 = XmlTree.convert_string_to_tree(xml1)
tree2 = XmlTree.convert_string_to_tree(xml2)

comparator = XmlTree()

if comparator.xml_compare(tree1, tree2, ["from"]):
    print "XMLs match"
else:
    print "XMLs don't match"

Most of the credit for this code must be given to syawar

GitHub

github.com › joh › xmldiffs

GitHub - joh/xmldiffs: Compare two XML files, ignoring element and attribute order.

-x, --xml - instead of comparing two xml files, write sorted contents of FILE1 to FILE2. In this mode the --output option is ignored.

Starred by 96 users

Forked by 27 users

Languages Python 100.0% | Python 100.0%

Scooter Forums

forum.scootersoftware.com › home › beyond compare 4 discussion › general

Ignore XML attribute order when comparing XML - Scooter Forums

QT for example will save XML every time with attributes in a random order. Is there a way to configure BC4 to get it to show those lines as the same? Thank you for any help ... Hello, The Text Compare considers the positioning as important, but can align character by character within a line.

Super User

superuser.com › questions › 79920 › how-can-i-diff-two-xml-files

linux - How can I diff two XML files? - Super User

Top answer

1 of 10

132

One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.

$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Or as a one-liner.

$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)

2 of 10

Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".

$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

$ xmllint | grep c14
    --c14n : save in W3C canonical format v1.0 (with comments)
    --c14n11 : save in W3C canonical format v1.1 (with comments)
    --exc-c14n : save in W3C exclusive canonical format (with comments)

$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686

$ cat /etc/system-release
CentOS release 6.5 (Final)

Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.

PyPI

pypi.org › project › xmldiff

xmldiff · PyPI

Python versions 3.7 to 3.11 are now supported. Improved node matching method, that puts more emphasis similarities than differences when weighing attributes vs children. Added a parameter to return error code 1 when there are differences between the files · Added a parameter for ignoring attributes in comparison. Solved a bug in xmlpatch ...

      » pip install xmldiff

Published May 13, 2024

Version 2.7.0

Repository https://github.com/Shoobx/xmldiff

Find elsewhere

Google Bing Mojeek

Oxygen XML

oxygenxml.com › home › board index › oxygen xml editor/author/developer › feature request

Diff Files and Dirs: Add Ignore Element order for XML documents - Oxygen XML Forum

Hello, I've just filed a new improvement request on our internal issue tracking system to add the possibility to ignore the order of the elements, similar to what already is done for attributes. You'll be notified when this feature will be available. Best Regards, Florin ... Hi Florin, out of curiosity (and because I could very well need this feature): Did your feature request make it to market? I use oxygen file diff v18 but do not seem to be able to diff xml files such that the change of order of neighbouring xml elements is not considered to be a difference.

Lightrun

lightrun.com › answers › xmlunit-xmlunit-comparing-xml-files-ignoring-elements-attribute-order

Comparing XML files ignoring elements attribute Order

Compare XML ignoring order of child elements - Stack Overflow · I wrote a simple python tool for this called xmldiffs : Compare two XML files, ignoring element and attribute order.Read more >

Stack Exchange

unix.stackexchange.com › questions › 64188 › how-to-compare-two-xml-files-having-same-data-in-different-lines

bash - how to compare two xml files having same data in different lines? - Unix & Linux Stack Exchange

Top answer

1 of 4

You can achieve what you want with the help of a small Python script (you'll need Python installed, as well as the lxml toolkit).

tagsort.py:

#!/usr/bin/python

import sys
from lxml import etree

filename, tag = sys.argv[1:]

doc = etree.parse(filename, etree.XMLParser(remove_blank_text=True))
root = doc.getroot()
root[:] = sorted(root, key=lambda el: el.findtext(tag))
print etree.tostring(doc, pretty_print=True)

This script sorts the first-level elements under the XML document root by the content of a second-level element, sending the result to stdout. It's called like this:

$ python tagsort.py filename tag

Once you've got that, you can use process substitution to get a diff based on its output (I've added one element and changed another in your example files to show a non-empty result):

$ diff <(python tagsort.py file1 Id) <(python tagsort.py file2 Id)
4a5
>     <AddedTag>Something</AddedTag>
17c18
<     <Role>X</Role>
---
>     <Role>S</Role>

2 of 4

I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files

That post suggests doing a canonical xml sort then doing a diff. The following should work for you if you are on linux, mac, or if you have windows something like cygwin installed:

$ xmllint --c14n File1.xml > 1.xml
$ xmllint --c14n File2.xml > 2.xml
$ diff 1.xml 2.xml

PyPI

pypi.org › project › xml_diff

xml_diff

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Stack Overflow

stackoverflow.com › questions › 3007330 › compare-xml-snippets

python - Compare XML snippets? - Stack Overflow

Top answer

1 of 10

You can use formencode.doctest_xml_compare -- the xml_compare function compares two ElementTree or lxml trees.

2 of 10

The order of the elements can be significant in XML, this may be why most other methods suggested will compare unequal if the order is different... even if the elements have same attributes and text content.

But I also wanted an order-insensitive comparison, so I came up with this:

from lxml import etree
import xmltodict  # pip install xmltodict


def normalise_dict(d):
    """
    Recursively convert dict-like object (eg OrderedDict) into plain dict.
    Sorts list values.
    """
    out = {}
    for k, v in dict(d).iteritems():
        if hasattr(v, 'iteritems'):
            out[k] = normalise_dict(v)
        elif isinstance(v, list):
            out[k] = []
            for item in sorted(v):
                if hasattr(item, 'iteritems'):
                    out[k].append(normalise_dict(item))
                else:
                    out[k].append(item)
        else:
            out[k] = v
    return out


def xml_compare(a, b):
    """
    Compares two XML documents (as string or etree)

    Does not care about element order
    """
    if not isinstance(a, basestring):
        a = etree.tostring(a)
    if not isinstance(b, basestring):
        b = etree.tostring(b)
    a = normalise_dict(xmltodict.parse(a))
    b = normalise_dict(xmltodict.parse(b))
    return a == b

sourcehut

sr.ht › ~nolda › xdiff

xdiff: A Python script for comparing XML files for structural or textual differences.

usage: xdiff.py [-h] [-a] [-C] [-i] [-I re] [-n] [-N] [-p] [-P] [-q] [-v] [-w] file1 file2 positional arguments: file1 XML file 1 file2 XML file 2 optional arguments: -h, --help show this help message and exit -a, --all-context output all context lines -C, --force-color preserve color and formatting when piping output -i, --indent indent XML trees -I re, --ignore re ignore matching lines -n, --no-context output no context lines -N, --no-meta suppress metadata (files header and hunk numbers) -p, --pis preserve processing-instructions in output -P, --comments preserve comments in output -q, --quiet only return exit status -v, --version show program's version number and exit -w, --words compare words

Readthedocs

xmldiff.readthedocs.io › en › stable › api.html

Python API — xmldiff documentation

ignored_attrs: A list of XML node attributes that will be ignored in comparison.

Altova

altova.com › xmlspy-xml-editor › compare-xml

Compare XML Files | Altova

The order of XML attributes is irrelevant because XML processors do not consider the sequence that attributes appear in a particular element. XMLSpy accounts for this and intelligently ignores the attribute order, but a conventional differencing utility cannot and would therefore report every ...

Python

docs.python.org › 3 › library › xml.etree.elementtree.html

xml.etree.ElementTree — The ElementTree XML API

Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. It reduces the freedom that XML serializers have and instead generates a more constrained XML representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace.

GitHub

github.com › MartinPetkov › XMLFileCompare

GitHub - MartinPetkov/XMLFileCompare: Python script for comparing two XML files for content, regardless of order · GitHub

Python script for comparing two XML files for content, regardless of order - MartinPetkov/XMLFileCompare

Author MartinPetkov