There's diff_prettyHtml() in the diff-match-patch library from Google.
There's diff_prettyHtml() in the diff-match-patch library from Google.
Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.
For instance, if you generate the HTML like this:
import difflib
import sys
fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()
diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)
sys.stdout.writelines(diff)
then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.
Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.
» pip install html-diff
lxml can do something similar to what you want. From the docs:
>>> from lxml.html.diff import htmldiff
>>> doc1 = '''<p>Here is some text.</p>'''
>>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>'''
>>> print htmldiff(doc1, doc2)
<p>Here is <ins><b>a lot</b> of <i>text</i>.</ins> <del>some text.</del> </p>
I don't know of any other Python library for this specific task, but you may want to look into word-by-word diffs. They may approximate what you want.
One example is this one, implemented in both PHP and Python (save it as diff.py, then import diff)
>>> diff.htmlDiff(a,b)
>>> '<del><p>i</del> <ins><h2>i</ins> love <del>it</p></del> <ins>it </p></ins>'
Checkout diff2HtmlCompare (full disclosure: I'm the author). If you're trying to just visualize the differences, then this may help you. If you are trying to extract the differences and do something with it, then you can use difflib as suggested by others (the script above just wraps difflib and uses pygments for syntax highlighting). Doug Hellmann has done a pretty good job detailing how to use difflib, I'd suggest checking out his tutorial.
» pip install diffhtml
» pip install diff-tool
If you use difflib.Differ, you can keep only the difference lines and by filtering with the two letter codes that get written on every line. From the docs:
class difflib.Differ
This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.
Each line of a Differ delta begins with a two-letter code:
Code Meaning
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either inputsequence
Lines beginning with ‘?’ attempt to guide the eye to intraline differences, and were not present in either input sequence. These lines can be confusing if the sequences contain tab characters
By keeping the lines started with '- ' and '+ ' just the differences.
I would start by trying to iterate through each html file line by line and checking to see if the lines are the same.
with open('file1.html') as file1, open('file2.html') as file2:
for file1Line, file2Line in zip(file1, file2):
if file1Line != file2Line:
print(file1Line.strip('\n'))
print(file2Line.strip('\n'))
You'll have to deal with newline characters and multiple line differences in a row, but this is probably a good start :)