Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.
A python version is included.
http://code.google.com/p/google-diff-match-patch/
Answer from Density 21.5 on Stack OverflowGenerating and applying diffs in python - Stack Overflow
Compare two files report difference in python - Stack Overflow
What's the best library to find differences in two text files and visulaize it?
Python - difference between two strings - Stack Overflow
Videos
Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.
A python version is included.
http://code.google.com/p/google-diff-match-patch/
Does difflib.unified_diff do want you want? There is an example here.
The original link is broken. There is an example here
import difflib
lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()
lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()
# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
print line
Outputs the following:
--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
dog
-cat
bird
buffalo
-gophers
-hound
+gopher
horse
+mouse
This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
print line
Outputting this:
--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse
But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
print line
Giving us this output:
+cat
-cat
-gophers
-hound
+gopher
+mouse
Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']
print 'additions:'
for line in added:
print line
print
print 'additions, ignoring position'
for line in added:
if line not in removed:
print line
Outputting:
additions:
cat
gopher
mouse
additions, ignoring position:
gopher
mouse
You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
The difflib library is useful for this, and comes in the standard library. I like the unified diff format.
http://docs.python.org/2/library/difflib.html#difflib.unified_diff
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
)
for line in diff:
sys.stdout.write(line)
Outputs:
--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
one
two
-dogs
three
And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
n=0,
)
for line in diff:
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
sys.stdout.write(line[1:])
» pip install diff-match-patch
difflib is popular. Most of the answers on Stack Overflow uses it. But, I think Google's diff-match-patch is better. See an example here. Or, am I missing something?
You can use ndiff in the difflib module to do this. It has all the information necessary to convert one string into another string.
A simple example:
import difflib
cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
('abcdefg','xac')]
for a,b in cases:
print('{} => {}'.format(a,b))
for i,s in enumerate(difflib.ndiff(a, b)):
if s[0]==' ': continue
elif s[0]=='-':
print(u'Delete "{}" from position {}'.format(s[-1],i))
elif s[0]=='+':
print(u'Add "{}" to position {}'.format(s[-1],i))
print()
prints:
afrykanerskojęzyczny => afrykanerskojęzycznym
Add "m" to position 20
afrykanerskojęzyczni => nieafrykanerskojęzyczni
Add "n" to position 0
Add "i" to position 1
Add "e" to position 2
afrykanerskojęzycznym => afrykanerskojęzyczny
Delete "m" from position 20
nieafrykanerskojęzyczni => afrykanerskojęzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
nieafrynerskojęzyczni => afrykanerskojzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
Add "k" to position 7
Add "a" to position 8
Delete "ę" from position 16
abcdefg => xac
Add "x" to position 0
Delete "b" from position 2
Delete "d" from position 4
Delete "e" from position 5
Delete "f" from position 6
Delete "g" from position 7
I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:
import difflib
case_a = 'afrykbnerskojęzyczny'
case_b = 'afrykanerskojęzycznym'
output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']
» pip install diff