compare xml files python

stackoverflow.com › questions › 24492895 › comparing-two-xml-files-in-python

This is actually a reasonably challenging problem (due to what "difference" means often being in the eye of the beholder here, as there will be semantically "equivalent" information that you probably don't want marked as differences).

You could try using xmldiff, which is based on work in the paper Change Detection in Hierarchically Structured Information.

Answer from Nick Bastin on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 24492895 › comparing-two-xml-files-in-python

Comparing two xml files in python - Stack Overflow

Top answer

1 of 4

You could try using xmldiff, which is based on work in the paper Change Detection in Hierarchically Structured Information.

2 of 4

My approach to the problem was transforming each XML into a xml.etree.ElementTree and iterating through each of the layers. I also included the functionality to ignore a list of attributes while doing the comparison.

The first block of code holds the class used:

import xml.etree.ElementTree as ET
import logging

class XmlTree():

    def __init__(self):
        self.hdlr = logging.FileHandler('xml-comparison.log')
        self.formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')

    @staticmethod
    def convert_string_to_tree( xmlString):

        return ET.fromstring(xmlString)

    def xml_compare(self, x1, x2, excludes=[]):
        """
        Compares two xml etrees
        :param x1: the first tree
        :param x2: the second tree
        :param excludes: list of string of attributes to exclude from comparison
        :return:
            True if both files match
        """

        if x1.tag != x2.tag:
            self.logger.debug('Tags do not match: %s and %s' % (x1.tag, x2.tag))
            return False
        for name, value in x1.attrib.items():
            if not name in excludes:
                if x2.attrib.get(name) != value:
                    self.logger.debug('Attributes do not match: %s=%r, %s=%r'
                                 % (name, value, name, x2.attrib.get(name)))
                    return False
        for name in x2.attrib.keys():
            if not name in excludes:
                if name not in x1.attrib:
                    self.logger.debug('x2 has an attribute x1 is missing: %s'
                                 % name)
                    return False
        if not self.text_compare(x1.text, x2.text):
            self.logger.debug('text: %r != %r' % (x1.text, x2.text))
            return False
        if not self.text_compare(x1.tail, x2.tail):
            self.logger.debug('tail: %r != %r' % (x1.tail, x2.tail))
            return False
        cl1 = x1.getchildren()
        cl2 = x2.getchildren()
        if len(cl1) != len(cl2):
            self.logger.debug('children length differs, %i != %i'
                         % (len(cl1), len(cl2)))
            return False
        i = 0
        for c1, c2 in zip(cl1, cl2):
            i += 1
            if not c1.tag in excludes:
                if not self.xml_compare(c1, c2, excludes):
                    self.logger.debug('children %i do not match: %s'
                                 % (i, c1.tag))
                    return False
        return True

    def text_compare(self, t1, t2):
        """
        Compare two text strings
        :param t1: text one
        :param t2: text two
        :return:
            True if a match
        """
        if not t1 and not t2:
            return True
        if t1 == '*' or t2 == '*':
            return True
        return (t1 or '').strip() == (t2 or '').strip()

The second block of code holds a couple of XML examples and their comparison:

xml1 = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"

xml2 = "<note><to>Tove</to><from>Daniel</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"

tree1 = XmlTree.convert_string_to_tree(xml1)
tree2 = XmlTree.convert_string_to_tree(xml2)

comparator = XmlTree()

if comparator.xml_compare(tree1, tree2, ["from"]):
    print "XMLs match"
else:
    print "XMLs don't match"

Most of the credit for this code must be given to syawar

PyPI

pypi.org › project › xmldiff

xmldiff · PyPI

result = main.patch_file('file.diff', 'file1.xml') ... Easier to maintain, the code is less complex and more Pythonic, and uses more custom classes instead of just nesting lists and dicts.

      » pip install xmldiff

Published May 13, 2024

Version 2.7.0

Repository https://github.com/Shoobx/xmldiff

Discussions

Best ways to compare xml files with python

What are your goals? We work with XML trees over RPC using the Python ElementTree XML API. Also, the Eclipse IDE has a nice XML editor. However, it seems that everything is moving to JSON, and Javascript is the way to process it. More on reddit.com

r/Python

March 14, 2019

bash - how to compare two xml files having same data in different lines? - Unix & Linux Stack Exchange

You could try xmldiff, but I think ... is to use an XML parser & generator to put each file in a canonical order and format, then use xmldiff or diff. A job for your favorite scripting language (Perl, Ruby, Python, etc.).... More on unix.stackexchange.com

unix.stackexchange.com

compare xml files using python - Stack Overflow

Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python? More on stackoverflow.com

stackoverflow.com

Question on parsing and comparing two different XML docs.

ESPError: Not enough information in post You might want to post an example of the data contained in the xml files and what you would expect from the output. It's not clear exactly what you're after. If you're just looking to parse xml files, then have a look at the xml modules in the standard library. Personally, I'd use xml.etree, as I find it simple to use. More on reddit.com

r/Python

February 10, 2012

Videos

01:45

YouTube

Compare XML with XML Compare (Windows) | DeltaXML - YouTube

XML Compare In Action | DeltaXML - YouTube

How to Compare XML Files - YouTube

December 14, 2017

07:30

YouTube

Generate a comparison report using Python - YouTube

May 27, 2017

30:02

YouTube

Python XML Parser Tutorial | Read and Write XML in Python | Python ...

sr.ht › ~nolda › xdiff

xdiff: A Python script for comparing XML files for structural or textual differences.

xdiff.py is a Python 3 script for comparing XML files. It outputs structural and textual differences -- i.e.

GeeksforGeeks

geeksforgeeks.org › python › compare-two-xml-files-in-python

Compare Two Xml Files in Python - GeeksforGeeks

July 23, 2025 - In this article, we will see how we can compare two XML files in Python.

Readthedocs

xmldiff.readthedocs.io › en › stable › api.html

Python API — xmldiff documentation - Read the Docs

By default xmldiff will compare each node from one tree with all nodes from the other tree.

reddit.com › r/python › best ways to compare xml files with python

r/Python on Reddit: Best ways to compare xml files with python

March 14, 2019 - We work with XML trees over RPC using the Python ElementTree XML API. Also, the Eclipse IDE has a nice XML editor. However, it seems that everything is moving to JSON, and Javascript is the way to process it. ... a view weeks ago I started a git hub projekt for comparing xml files.

GitHub

github.com › JoshData › xml_diff

GitHub - JoshData/xml_diff: Compares two XML documents by diffing their text. · GitHub

The comparison is completely blind to the structure of the two XML documents. It does a word-by-word comparison on the text content only, and then it goes back into the original documents and wraps changed text in new <del> and <ins> wrapper elements. The documents are then concatenated to form a new document and the new document is printed on standard output. Or use this as a library and call compare yourself with two lxml.etree.Element nodes (the roots of your documents). The script is written in Python 3.

Starred by 44 users

Forked by 10 users

Languages Python

Find elsewhere

Google Bing Mojeek

GitHub

gist.github.com › guillaumevincent › 74e5a9551ee14a774e5e

compare two XML in python · GitHub

compare two XML in python · Raw · test_xmldiff.py · This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

dale lane

dalelane.co.uk › blog

Comparing XML files ignoring order of attributes and child elements « dale lane

October 6, 2014 - On Mac, I run: $ python xmldiff.py diffmerge testA.xml testB.xml ... The source showing how this works is available in a gist at gist.github.com/dalelane. It’s a quick hack to let me compare a handful of files, so it’s not been rigorously tested.

Quora

quora.com › If-you-need-to-compare-two-XML-files-and-generate-a-third-containing-subtraction-between-values-in-the-two-XML-files-on-a-Windows-machine-Had-you-used-Powershell-or-Python

If you need to compare two XML files and generate a third containing subtraction between values in the two XML files on a Windows machine. Had you used Powershell or Python? - Quora

Answer (1 of 3): > If you need to compare two XML files and generate a third containing subtraction between values in the two XML files on a Windows machine. Had you used Powershell or Python? PowerShell is built-in to all modern versions of Windows so the obvious solution starts with using it t...

GitHub

github.com › MartinPetkov › XMLFileCompare

GitHub - MartinPetkov/XMLFileCompare: Python script for comparing two XML files for content, regardless of order · GitHub

Compares two XML files and returns true if they have the same elements with the same data and attributes, but not necessarily in the same order

Author MartinPetkov

GitHub

gist.github.com › dalelane › a0514b2e283a882d9ef3

Comparing XML files ignoring order of attributes and elements - see http://dalelane.co.uk/blog/?p=3225 for background · GitHub

parser = le.XMLParser(remove_comments=True) # parse the XML file and get a pointer to the top xmldoc = le.parse(original, parser=parser)

Complianceascode

complianceascode.github.io › template › 2022 › 10 › 24 › xmldiff-unit-tests.html

Using xmldiff in Python unit tests - ComplianceAsCode Blog

October 24, 2022 - That method isn’t sensitive to whitespace or formatting of the XML files, so we can save them in a pretty format. So we reworked our tests so that the test first saved the output of the tested method to a temporary file and then we called xmllint.main.diff_files() to compare this temporary file with our static file in test data.

Server Fault

serverfault.com › questions › 430671 › utility-to-logically-compare-two-xml-files

Utility to LOGICALLY compare two xml files? - Server Fault

Top answer

1 of 5

Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.

A commercial tool that specializes in this problem is DeltaXML.

If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.

2 of 5

Good answer here:

Question: How can I diff two XML files? | Super User

Answer: How can I diff two XML files? | Super User

$ xmllint --format --exc-c14n one.xml > 1.xml
$ xmllint --format --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.

Stack Exchange

unix.stackexchange.com › questions › 64188 › how-to-compare-two-xml-files-having-same-data-in-different-lines

bash - how to compare two xml files having same data in different lines? - Unix & Linux Stack Exchange

Top answer

1 of 4

You can achieve what you want with the help of a small Python script (you'll need Python installed, as well as the lxml toolkit).

tagsort.py:

#!/usr/bin/python

import sys
from lxml import etree

filename, tag = sys.argv[1:]

doc = etree.parse(filename, etree.XMLParser(remove_blank_text=True))
root = doc.getroot()
root[:] = sorted(root, key=lambda el: el.findtext(tag))
print etree.tostring(doc, pretty_print=True)

This script sorts the first-level elements under the XML document root by the content of a second-level element, sending the result to stdout. It's called like this:

$ python tagsort.py filename tag

Once you've got that, you can use process substitution to get a diff based on its output (I've added one element and changed another in your example files to show a non-empty result):

$ diff <(python tagsort.py file1 Id) <(python tagsort.py file2 Id)
4a5
>     <AddedTag>Something</AddedTag>
17c18
<     <Role>X</Role>
---
>     <Role>S</Role>

2 of 4

I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files

That post suggests doing a canonical xml sort then doing a diff. The following should work for you if you are on linux, mac, or if you have windows something like cygwin installed:

$ xmllint --c14n File1.xml > 1.xml
$ xmllint --c14n File2.xml > 2.xml
$ diff 1.xml 2.xml

Stack Overflow

stackoverflow.com › questions › 53432591 › compare-xml-files-using-python

compare xml files using python - Stack Overflow

Top answer

1 of 3

Use the xmldiff to perform this exact task.

main.py

from xmldiff import main
diff = main.diff_files("file1.xml", "file2.xml")
print(diff)

output

[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]

2 of 3

You can switch to the XMLFormatter and manually filter out the results:

...
# Change formatter:
formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)

...

# after `out` has been retrieved:
import re
for i in out.splitlines():
  if re.search(r'\bdiff:\w+', i):
    print(i)

# Result:
#       <type st="9999" diff:delete=""/>

PyPI

pypi.org › project › xml_diff

xml_diff

JavaScript is disabled in your browser. Please enable JavaScript to proceed · A required part of this site couldn’t load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a different browser

Python Forum

python-forum.io › thread-36089.html

XML compare

Im looking for a script what can compare xml and works simular as the compare function in notepad++. Is there a script like that?

GitHub

github.com › cfpb › xtdiff

GitHub - cfpb/xtdiff: :warning: THIS REPO IS DEPRECATED Python library to compare two XML trees and generate a set of actions that transform one into the other · GitHub

February 1, 2019 - XML Tree Diff is a Python library that implements "Change detection in hierarchically structured information", by Sudarshan S. Chawathe, Anand Rajaraman, Hector Garcia-Molina, and Jennifer Widom..

Starred by 27 users

Forked by 6 users

Languages Python

GitHub

github.com › joh › xmldiffs

GitHub - joh/xmldiffs: Compare two XML files, ignoring element and attribute order.

xmldiffs first parses each XML file and spits them out sorted by element (tag) name and attributes. The result is then passed to diff for a semantic XML comparison.

Starred by 96 users

Forked by 27 users

Languages Python 100.0% | Python 100.0%