You can use difflib.Differ to return a single sequence of lines with a marker at the start of each line which describes the line. The markers tell you the following information about the line:
| Marker | Description |
|---|---|
'- ' |
line unique to file 1 |
'+ ' |
line unique to file 2 |
' ' |
line common to both files |
'? ' |
line not present in either input files |
You can use this information to decide how to display the data. For example, if the marker is , you put the line both in the left and right widgets. If it's + , you could put a blank line on the left and the actual line on the right showing that the line is unique to the text on the right. Likewise, - means the line is unique to the left.
For example, you can create two text widgets t1 and t2, one for the left and one for the right. You can compare two files by creating a list of lines for each and then passing them to the compare method of the differ and then iterating over the results.
t1 = tk.Text(...)
t2 = tk.Text(...)
f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()
differ = difflib.Differ()
for line in differ.compare(f1, f2):
marker = line[0]
if marker == " ":
# line is same in both
t1.insert("end", line[2:])
t2.insert("end", line[2:])
elif marker == "-":
# line is only on the left
t1.insert("end", line[2:])
t2.insert("end", "\n")
elif marker == "+":
# line is only on the right
t1.insert("end", "\n")
t2.insert("end", line[2:])
The above code ignores lines with the marker ? since those are extra lines that attempt to bring attention to the different characters on the previous line and aren't actually part of either file. You could use that information to highlight the individual characters if you wish.
You can use difflib.Differ to return a single sequence of lines with a marker at the start of each line which describes the line. The markers tell you the following information about the line:
| Marker | Description |
|---|---|
'- ' |
line unique to file 1 |
'+ ' |
line unique to file 2 |
' ' |
line common to both files |
'? ' |
line not present in either input files |
You can use this information to decide how to display the data. For example, if the marker is , you put the line both in the left and right widgets. If it's + , you could put a blank line on the left and the actual line on the right showing that the line is unique to the text on the right. Likewise, - means the line is unique to the left.
For example, you can create two text widgets t1 and t2, one for the left and one for the right. You can compare two files by creating a list of lines for each and then passing them to the compare method of the differ and then iterating over the results.
t1 = tk.Text(...)
t2 = tk.Text(...)
f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()
differ = difflib.Differ()
for line in differ.compare(f1, f2):
marker = line[0]
if marker == " ":
# line is same in both
t1.insert("end", line[2:])
t2.insert("end", line[2:])
elif marker == "-":
# line is only on the left
t1.insert("end", line[2:])
t2.insert("end", "\n")
elif marker == "+":
# line is only on the right
t1.insert("end", "\n")
t2.insert("end", line[2:])
The above code ignores lines with the marker ? since those are extra lines that attempt to bring attention to the different characters on the previous line and aren't actually part of either file. You could use that information to highlight the individual characters if you wish.
Building on @Bryan Oakley's answer, I wrote a quick Gist:
https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed
with a side-by-side diff method (including the method to produce this side-by-side arrangement using the textwrap library) that you can call on two lists of lines:
print(better_diff(
["a", "c", "a", "a", "a", "a", "a", "a", "e"],
["a", "c", "b", "a", "a", "a", "a", "d", "a", "a"],
width=20,
as_string=True,
left_title=" LEFT",
))
will produce:
LEFT |
-------- | --------
a | a
c | c
| b
a | a
a | a
a | a
a | a
| d
a | a
a | a
e |
» pip install ydiff
Side by side diff of large files - Unix & Linux Stack Exchange
Compare two files report difference in python - Stack Overflow
Compare two files
Comparing 2 files with python
Videos
» pip install cdiff
The best I have so far is:
diff -y --suppress-common-lines --speed-large-files file1 file2
Use --width=200 to control the width.
However unfortunately that doesn't show you any context lines.
I found another solution using grep which seemed ok but it uses a regex and is just too slow.
vimdiff worked great for me in 'MINGW64' [mintty 2.0.3(x86_64-pc-msys 2015 by Andy Koppe) terminal that installed with my Vagrant install following VaryingVagrantVagrants instructions.]
Ctrl+f to navigate forward;
Ctrl+b to navigate backward/up
:q-Enter and again :q-Enter (to close the 2nd file if you've compared two files as below.)
]2
Arrow keys also function as well as your mouse-wheel; just be patient to let it finish loading... and you can control the color scheme in your 'MINGW64' window options as well as the .minttyrc file (see here for more info):
]4 I made my cursor orange, blinking, so it's easier to locate.
2:
import difflib
lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()
lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()
# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
print line
Outputs the following:
--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
dog
-cat
bird
buffalo
-gophers
-hound
+gopher
horse
+mouse
This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
print line
Outputting this:
--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse
But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
print line
Giving us this output:
+cat
-cat
-gophers
-hound
+gopher
+mouse
Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']
print 'additions:'
for line in added:
print line
print
print 'additions, ignoring position'
for line in added:
if line not in removed:
print line
Outputting:
additions:
cat
gopher
mouse
additions, ignoring position:
gopher
mouse
You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
The difflib library is useful for this, and comes in the standard library. I like the unified diff format.
http://docs.python.org/2/library/difflib.html#difflib.unified_diff
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
)
for line in diff:
sys.stdout.write(line)
Outputs:
--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
one
two
-dogs
three
And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
n=0,
)
for line in diff:
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
sys.stdout.write(line[1:])
I would like to compare two files with each other and show the differences. There are so many examples when you google this but I can't quite find what I am looking for. My two files with be very similar, but a line or two different (added, or removed) All I want is to compare the two files and show lines that are added with a + and lines that have been removed with a -
So two file A and B, A is the original file and B is the new file. So if lines have been added or removed from B then show the lines added or removed with a + or a -.
I'm writing a script that checks a device configuration every day and if it's changed from the day before then show what has been added/removed.
Thank you
Hello, I need to compare 2 txt files in python, and save the difference in a third file. Maybe using difflib will help. But i only need the difference between the 2 txt files. any ideas ?