Videos
Just parse output of diff like this (change '- ' to '+ ' if needed):
#!/usr/bin/env python
# difflib_test
import difflib
file1 = open('/home/saad/Code/test/new_tweets', 'r')
file2 = open('/home/saad/PTITVProgs', 'r')
diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(x[2:] for x in diff if x.startswith('- '))
print delta
There are multiple diff styles and different functions exist for them in the difflib library. unified_diff, ndiff and context_diff.
If you don't want the line number summaries, ndiff function gives a Differ-style delta:
import difflib
f1 = '''1
2
3
4
5'''
f2 = '''1
3
4
5
6'''
diff = difflib.ndiff(f1,f2)
for l in diff:
print(l)
Output:
1
- 2
3
4
5
+ 6
EDIT:
You could also parse the diff to extract only the changes if that's what you want:
>>>changes = [l for l in diff if l.startswith('+ ') or l.startswith('- ')]
>>>for c in changes:
print(c)
>>>
- 2
+ 6
Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.
A python version is included.
http://code.google.com/p/google-diff-match-patch/
Does difflib.unified_diff do want you want? There is an example here.
The original link is broken. There is an example here
For starters, you need to pass strings to difflib.SequenceMatcher, not files:
# Like so
difflib.SequenceMatcher(None, str1, str2)
# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())
That'll fix your error.
To get the first non-matching string, see the difflib documentation.
Here is a quick example of comparing the contents of two files using Python difflib...
import difflib
file1 = "myFile1.txt"
file2 = "myFile2.txt"
diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),