HashMap solution

stackoverflow.com › questions › 7107517 › how-to-compare-large-text-files

java - How to compare large text files? - Stack Overflow

coderanch.com › t › 672590 › java › Compare-huge-text-files-java

1 of 14

I think, your way is rather reasonable.

Next -- you may read lines to buffer like this:

final List<String> lines = new ArrayList<>();
try{
    final List<String> block = new ArrayList<>(BLOCK_SIZE);
    for(int i=0;i<BLOCK_SIZE;i++){
       final String line = ...;//read line from file
       block.add(line);
    }
    lines.addAll(block); 
}catch(OutOfMemory ooe){
    //break
}

So you read as many lines, as you can -- leaving last BLOCK_SIZE of free memory. BLOCK_SIZE should be big enouth to the rest of you program to run without OOM

2 of 14

In an ideal world, you would be able to read in every line of file_2 into memory (probably using a fast lookup object like a HashSet, depending on your needs), then read in each line from file_1 one at a time and compare it to your data structure holding the lines from file_2.

As you have said you run out of memory however, I think a divide-and-conquer type strategy would be best. You could use the same method as I mentioned above, but read in a half (or a third, a quarter... depending on how much memory you can use) of the lines from file_2 and store them, then compare all of the lines in file_1. Then read in the next half/third/quarter/whatever into memory (replacing the old lines) and go through file_1 again. It means you have to go through file_1 more, but you have to work with your memory constraints.

EDIT: In response to the added detail in your question, I would change my answer in part. Instead of reading in all of file_2 (or in chunks) and reading in file_1 a line at a time, reverse that, as file_1 holds the data to check against.

Also, with regards searching the matching lines. I think the best way would be to do some processing on file_1. Create a HashMap<List<Range>> that maps a String ("mat1" - "mat50") to a list of Ranges (just a wrapper for a startOfRange int and an endOfRange int) and populate it with the data from file_1. Then write a function like (ignoring error checking)

boolean isInRange(String material, int value)
{
    List<Range> ranges = hashMapName.get(material);
    for (Range range : ranges)
    {
        if (value >= range.getStart() && value <= range.getEnd())
        {
            return true;
        }
    }
    return false;
}

and call it for each (parsed) line of file_2.

Coderanch

Compare two huge text files in java and get non matching records (Java in General forum at Coderanch)

I will explain in detail by giving an example File1: firstrowvalue1 | firstrowValue2 | firstrowvalue3 Secrowvalue4 | SecrowValue5 | Secrowvalue7 Thirdrowvalue8 | ThirdrowValue9 | Thirdrowvalue10 File2: firstrowvalue1 | firstrowValue2 | firstrowvalue3 Secrowvalue4 | SecrowValue5 | Secrowvalue6 Expected Output: Secrowvalue4 | SecrowValue5 |Secrowvalue7 Thirdrowvalue8 | ThirdrowValue9 |Thirdrowvalue10 Note: I highlighted the changes between two files. ... There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors ... Note that by using contains, line a could be longer than b by any arbitrary amount or could contain leading characters without returning false. If you want equality, use uquals. ... If this was my problem, I'd start by using one of the existing diff utilities -which you can invoke from within Java-, and then work on its output.

Videos

02:59

The easy way to compare files with Java 12 - Tutorial - YouTube

July 23, 2022

2.89K

youtube.com

How to compare 2 files with Files.mismatch in Java 12 | Core Java ...

May 14, 2022

02:59

How To Compare Two Text Files in Notepad++ Live Demo - YouTube

July 23, 2021

20:41

How to compare two text(.txt) files in Java - Selenium WebDriver ...

May 27, 2020

06:46

How to compare text differences between two text files - YouTube

codereview.stackexchange.com › questions › 187549 › compare-large-text-file-in-java-line-by-line

performance - Compare large text File in java line by line - Code Review Stack Exchange

baeldung.com › home › java › java io › compare the content of two files in java

1 of 1

Your program's time complexity is \ $\text{[math]}$ and space complexity is \ $\text{[math]}$ where 'n' is no. of lines in the first file and 'm' is no. of lines in second file.

Here is an optimised version of the above program, with time complexity \ $\text{[math]}$ and space complexity \ $\text{[math]}$ . I have not tested this program, but it should be able to present output on the screen within few seconds :)

import java.io.*;
import java.util.*;

class Main{
   public static void main(String args[]){
      try ( BufferedReader reader1 = new BufferedReader(new FileReader("file1.txt"));
            BufferedReader reader2 = new BufferedReader(new FileReader("file2.txt")) ){

            //assuming file1.txt is smaller than file2.txt in terms of no. of lines
            HashSet<String> file1 = new HashSet<String>();

            String s = null;
            while( ( s = reader1.readLine()) != null){
               file1.add(s);
            }

            while( (s = reader2.readLine()) != null ){
               if(file1.contains(s))
                  System.out.println(s);
            }
      }
      catch(IOException e){
         System.out.println(e);
      }

   }
}

Note: Only one file is in memory at a time and HashSet<> instead of nested loops for comparison.

Baeldung

Compare the Content of Two Files in Java | Baeldung

January 8, 2024 - If the files are of different sizes but the smaller file matches the corresponding lines of the larger file, then it returns the number of lines of the smaller file. The method Files::mismatch, added in Java 12, compares the contents of two files.

Stack Exchange

codereview.stackexchange.com › questions › 90147 › checking-for-differences-between-two-large-files

java - Checking for differences between two (large) files - Code Review Stack Exchange

1 of 1

There are some basic issues here, as well as some algorithmic complexities, and then some advanced suggestions.

Basic issues relate to Java code conventions, etc.

Basics

Use try-with-resources. You have code which may fail, and leave open files lying around to be garbage collected. Consider the following code:

try (FileInputStream fi1 = new FileInputStream(_File1);      
    FileInputStream fi2 = new FileInputStream(_File2);) {

    // do stuff with the files - they will be auto-closed.

}

The next thing, is why open the files if they are different lengths?

FileInputStream fi1 = new FileInputStream(_File1);      
FileInputStream fi2 = new FileInputStream(_File2);

byte[] fi1Content = new byte[step];
byte[] fi2Content = new byte[step];

if(_File1.length() == _File2.length()) { //Assumption 1

The code above, should be:

if(_File1.length() == _File2.length()) { //Assumption 1
    FileInputStream fi1 = new FileInputStream(_File1);      
    FileInputStream fi2 = new FileInputStream(_File2);

    byte[] fi1Content = new byte[step];
    byte[] fi2Content = new byte[step];

Use the power of the force, ... I mean parameters, Luke... I mean Daniel.

Your method should take the two files as parameters, not as class-level fields. As it stands, your code is not "reentrant", and it should be. Your method is:

public boolean compareStream() ....

but it should be

public boolean compareStream(File filea, File fileb) ....

Algorithm

Since you are comparing two files byte-by-byte, the hashing will make no difference. If the two files were on different machines, and you have a slow network between them, and if you could run the hashing algorithm remotely, then it probably makes sense to hash the two files on each side, and then just compare the small, and easy to transfer, hash result. Something like SHA-256.

So, there's no need to hash, just do byte-by-byte comparisons.

For large files like yours, why have such a small step size? Use something much larger like 4MB, not 4KB. It will make it much faster.

Alternatives

File IO is always slower than you want. Java has the NIO framework for higher-performance IO using Channels and Buffers. This would be a great time to learn how to use them, because, a 4MB Memory-Mapped IO operation on the two files will likely give you the best performance.

See the MemoryMapped IO JavaDoc

I ran up a test using NIO, and produced the following code:

public static final boolean compareFiles(final Path filea, final Path fileb) throws IOException {
    if (Files.size(filea) != Files.size(fileb)) {
        return false;
    }

    final long size = Files.size(filea);
    final int mapspan = 4 * 1024 * 1024;

    try (FileChannel chana = (FileChannel)Files.newByteChannel(filea);
            FileChannel chanb = (FileChannel)Files.newByteChannel(fileb)) {

        for (long position = 0; position < size; position += mapspan) {
            MappedByteBuffer mba = mapChannel(chana, position, size, mapspan);
            MappedByteBuffer mbb = mapChannel(chanb, position, size, mapspan);

            if (mba.compareTo(mbb) != 0) {
                return false;
            }

        }

    }
    return true;
}

private static MappedByteBuffer mapChannel(FileChannel channel, long position, long size, int mapspan) throws IOException {
    final long end = Math.min(size, position + mapspan);
    final long maplen = (int)(end - position);
    return channel.map(MapMode.READ_ONLY, position, maplen);
}

Note, the guts could be rewritten more concisely too:

            if (!mapChannel(chana, position, size, mapspan)
                   .equals(mapChannel(chanb, position, size, mapspan))) {
                return false;
            }

On my laptop, this is comparing 1.5GB files in under 2 seconds. Obviously, your milage may vary, and my laptop is an unknown beast.... but things that may play in to the equation:

I have 16GB mem
it's a 4 year old laptop
it has an SSD
there is file-system encryption
it runs linux.

stackoverflow.com › questions › 24681342 › compare-two-large-text-files-with-java

Compare two large text files with Java - Stack Overflow

public class SequenceComparator { private ArrayList<Sequence> bigSequences; private ArrayList<Sequence> smallSequences; public SequenceComparator() { bigSequences = new ArrayList<Sequence>(); smallSequences = new ArrayList<Sequence>(); } private String splitUpperSequences(String bigSeq) { StringBuilder sb = new StringBuilder(); for (char c : bigSeq.toCharArray()) { if (Character.isLetter(c) && Character.isUpperCase(c)) { sb.append(c); } } return sb.toString(); } public void readBigSequences() throws FileNotFoundException { Scanner s = new Scanner(new FileReader("test_ref_Aviso_bristol_k_31_c_4

stackoverflow.com › questions › 40519130 › java-comparing-two-huge-text-files

arrays - Java - Comparing two huge text files - Stack Overflow

Also suggest me whether this approach is performance efficient for comparing two huge text files. import java.io.*; public class CompareTwoFiles { static int count1 = 0 ; static int count2 = 0 ; static String arrayLines1[] = new String[countLines("\\Files_Comparison\\File1.txt")]; static String arrayLines2[] = new String[countLines("\\Files_Comparison\\File2.txt")]; public static void main(String args[]){ findDifference("\\Files_Comparison\\File1.txt","\\Files_Comparison\\File2.txt"); displayRecords(); } public static int countLines(String File){ int lineCount = 0; try { BufferedReader br = ne

DevGlan

devglan.com › corejava › comparing-files-in-java

Comparing Files In Java | DevGlan

Open your files by using the RandomAccessFile class and ask for the channel from this object if you want to memory map the two files. MappedByteBuffer, a representation of the memory area of your file's contents can be created from the channel ...

Find elsewhere

Google Bing Mojeek

stackoverflow.com › questions › 27379059 › determine-if-two-files-store-the-same-content

java - Determine if two files store the same content - Stack Overflow

coderanch.com › t › 480861 › java › Compare-Text-Files

1 of 10

107

Exactly what FileUtils.contentEquals method of Apache commons IO does and api is here.

Try something like:

File file1 = new File("file1.txt");
File file2 = new File("file2.txt");
boolean isTwoEqual = FileUtils.contentEquals(file1, file2);

It does the following checks before actually doing the comparison:

existence of both the files
Both file's that are passed are to be of file type and not directory.
length in bytes should be the same.
Both are different files and not one and the same.
Then compare the contents.

2 of 10

If you don't want to use any external libraries, then simply read the files into byte arrays and compare them (won't work pre Java-7):

byte[] f1 = Files.readAllBytes(file1);
byte[] f2 = Files.readAllBytes(file2);

by using Arrays.equals.

If the files are large, then instead of reading the entire files into arrays, you should use BufferedInputStream and read the files chunk-by-chunk as explained here.

Coderanch

Compare Two Text Files (Beginning Java forum at Coderanch)

Using Unix compare command via JAVA code is all time best option · Good, Better, Best, Don't take rest until, Good becomes Better, and Better becomes Best. Sidd : (SCJP 6 [90%] ) ... Thanks Siddesh, I am in the process of implementing it. I wll keep the thread updated. -Aditya ... Hi, Dear there is very big mistake either side of you or repliers.... I think you want to find such list of words which are common in both files...RIGHT?

Java Concept Of The Day

javaconceptoftheday.com › home › how to compare two text files in java?

How To Compare Two Text Files Line By Line In Java?

November 16, 2016 - BufferedReader br1 = new ... ... while(line1!=null || line2!=null) { if(!line1.contentEquals(line2)) { areEqual = false; System.out.println(“Two files have different content....

TutorialsPoint

tutorialspoint.com › compare-two-different-files-line-by-line-in-java

Compare Two Different Files Line by Line in Java

First, create a boolean variable called "areEqual" and initialise it to true. Second, create an int variable called "lineNum" and initialise it to 1. areEqual is a flag variable that is initially set to true and is changed to false when the input files' contents differ.

stackoverflow.com › questions › 17662747 › comparing-data-line-by-line-from-two-large-files

java - comparing data line by line from two large files - Stack Overflow

1 of 3

You need both files sorted by your search keys (recordIdx and topicIdx), so you can do kind of a merge operation like this

open file 1
open file 2
read lineA from file1
read lineB from file2
while (there is lineA and lineB) 
    if (key lineB < key lineA) 
        read lineB from file 2
        continue loop
    if (key lineB > key lineA)
        read lineA from file 1
        continue
    // at this point, you have lineA and lineB with matching keys
    process your data
    read lineB from file 2

Note that you'll only ever have two records in memory.

2 of 3

If you really need this in Java, why not use java-diff-utils ? It implements a well known diff algorithm.

stackoverflow.com › questions › 31426187 › want-to-find-content-difference-between-two-text-files-with-java

Want to find content difference between two text files with java - Stack Overflow

quora.com › What-is-the-best-way-to-compare-values-in-two-big-files-in-Java-line-by-line

1 of 4

The below code will serve your purpose irrespective of the content of the file.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

    public class Test {
        public Test(){
            System.out.println("Test.Test()");
        }

        public static void main(String[] args) throws Exception {
            BufferedReader br1 = null;
            BufferedReader br2 = null;
            String sCurrentLine;
            List<String> list1 = new ArrayList<String>();
            List<String> list2 = new ArrayList<String>();
            br1 = new BufferedReader(new FileReader("test.txt"));
            br2 = new BufferedReader(new FileReader("test2.txt"));
            while ((sCurrentLine = br1.readLine()) != null) {
                list1.add(sCurrentLine);
            }
            while ((sCurrentLine = br2.readLine()) != null) {
                list2.add(sCurrentLine);
            }
            List<String> tmpList = new ArrayList<String>(list1);
            tmpList.removeAll(list2);
            System.out.println("content from test.txt which is not there in test2.txt");
            for(int i=0;i<tmpList.size();i++){
                System.out.println(tmpList.get(i)); //content from test.txt which is not there in test2.txt
            }

            System.out.println("content from test2.txt which is not there in test.txt");

            tmpList = list2;
            tmpList.removeAll(list1);
            for(int i=0;i<tmpList.size();i++){
                System.out.println(tmpList.get(i)); //content from test2.txt which is not there in test.txt
            }
        }
    }

2 of 4

The memory will be a problem as you need to load both files into the program. I am using HashSet to ignore duplicates.Try this:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashSet;

public class FileReader1 {
    public static void main(String args[]) {

        String filename = "abc.txt";
        String filename2 = "xyz.txt";
        HashSet <String> al = new HashSet<String>();
        HashSet <String> al1 = new HashSet<String>();
        HashSet <String> diff1 = new HashSet<String>();
        HashSet <String> diff2 = new HashSet<String>();
        String str = null;
        String str2 = null;
        try {
            BufferedReader in = new BufferedReader(new FileReader(filename));
            while ((str = in.readLine()) != null) {
                al.add(str);
            }
            in.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            BufferedReader in = new BufferedReader(new FileReader(filename2));
            while ((str2 = in.readLine()) != null) {
                al1.add(str2);
            }
            in.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        for (String str3 : al) {
            if (!al1.contains(str3)) {
                diff1.add(str3);
            }
        }
        for (String str5 : al1) {
            if (!al.contains(str5)) {
                diff2.add(str5);
            }
        }
        for (String str4 : diff1) {
            System.out.println("Removed Path: "+str4);
        }
        for (String str4 : diff2) {
            System.out.println("Added Path: "+str4);
        }


    }

}

Output:

Removed Path: E:\Users\Documents\hello\b.properties
Added Path: E:\Users\Documents\hello\h.properties
Added Path: E:\Users\Documents\hello\g.properties

Quora

What is the best way to compare values in two big files in Java line by line? - Quora

Answer (1 of 3): You have too many different questions embedded in here. What is a "value"? A single line of text? How big are the files? Storing something efficiently and processing it efficiently are different matters. Are you trying to store the unique information found in both files with...

stackoverflow.com › questions › 50046170 › compare-2-text-files-in-java-and-write-the-difference-in-both-separately-into-an

performance - Compare 2 text files in java and write the difference in both separately into another file - Stack Overflow

1 of 3

HashMap solution

I thought about it and the HashMap solution is instant. I went ahead and coded up an example of it here.

It runs in 0ms while the arrayLists ran in 16ms for the same dataset

public static void main(String[] args) throws Exception {
    BufferedReader br1 = null;
    BufferedReader br2 = null;
    BufferedWriter bw3 = null;
    String sCurrentLine;
    int linelength;

    HashMap<String, Integer> expectedrecords = new HashMap<String, Integer>();
    HashMap<String, Integer> actualrecords = new HashMap<String, Integer>();

    br1 = new BufferedReader(new FileReader("expected.txt"));
    br2 = new BufferedReader(new FileReader("actual.txt"));

    while ((sCurrentLine = br1.readLine()) != null) {
        if (expectedrecords.containsKey(sCurrentLine)) {
            expectedrecords.put(sCurrentLine, expectedrecords.get(sCurrentLine) + 1);
        } else {
            expectedrecords.put(sCurrentLine, 1);
        }
    }
    while ((sCurrentLine = br2.readLine()) != null) {
        if (expectedrecords.containsKey(sCurrentLine)) {
            int expectedCount = expectedrecords.get(sCurrentLine) - 1;
            if (expectedCount == 0) {
                expectedrecords.remove(sCurrentLine);
            } else {
                expectedrecords.put(sCurrentLine, expectedCount);
            }
        } else {
            if (actualrecords.containsKey(sCurrentLine)) {
                actualrecords.put(sCurrentLine, actualrecords.get(sCurrentLine) + 1);
            } else {
                actualrecords.put(sCurrentLine, 1);
            }
        }
    }

    // expected is left with all records not present in actual
    // actual is left with all records not present in expected
    bw3 = new BufferedWriter(new FileWriter(new File("c.txt")));
    bw3.write("Records which are not present in actual\n");
    for (String key : expectedrecords.keySet()) {
        for (int i = 0; i < expectedrecords.get(key); i++) {
            bw3.write(key);
            bw3.newLine();
        }
    }
    bw3.write("Records which are in actual but not present in expected\n");
    for (String key : actualrecords.keySet()) {
        for (int i = 0; i < actualrecords.get(key); i++) {
            bw3.write(key);
            bw3.newLine();
        }
    }
    bw3.flush();
    bw3.close();
}

ex:

expected.txt

one
two
four
five
seven
eight

actual.txt

one
two
three
five
six

c.txt

Records which are not present in actual
four
seven
eight
Records which are in actual but not present in expected
three
six

ex 2:

expected.txt

one
two
four
five
seven
eight
duplicate
duplicate
duplicate

actual.txt

one
duplicate
two
three
five
six

c.txt

Records which are not present in actual
four
seven
eight
duplicate
duplicate
Records which are in actual but not present in expected
three
six

2 of 3

In Java 8 you can use Collection.removeIf(Predicate<T>)

list1.removeIf(line -> list2.contains(line));
list2.removeIf(line -> list1.contains(line));

list1 will then contain everything that is NOT in list2 and list2 will contain everything, that is NOT in list1.

stackoverflow.com › questions › 63521500 › how-to-find-difference-line-based-in-sorted-large-text-files-in-java-without-l

How to find difference (line-based) in sorted large text files in Java without loading them in full into memory? - Stack Overflow

itsallbinary.com › home › compare files side by side and hightlight diff using java | apache commons text diff | myers algorithm

1 of 2

You would need to read only from file which have smallest line(from compareTo perspective). In case both are the same , you read a line from both files, in case one bigger than other, you read only from the file with smaller compareTo. In case you don't read from same files twice in a row it mean you have a difference. All lines between switching reading are different( Switch from reading only from file 1 to file 2 or both or switching from reading only file 2 to file1 or both).

A sample to be more clear. Case you switch from file1 reading to file2:

            if(line1.compareTo(line2)>0){
                if(lastRead==1) {
                    System.out.println(previousLines+ " found in "+path1 +" but not in "+ path2);
                    previousLines.clear();
                }
                previousLines.add(line2);
                line2=in2.readLine();
                 lastRead = 1;
            }

In case line1 is bigger than line2( line1 being current line from file1, line2 current line from file 2), it mean I'll next go to read only from second file. And in case in the past,I've read only from file1(not from both at same time or second one), all lines in previousLines should be listed. In previousLines, I add lines when they are different. lastRead keep track of the last file I read from(0 - both at same time, 1 - only first, 2-only second).

Late edit: All method body, but as I mentioned in the comment,it didn't check what happen if I finish read from one file before another. As it is now it works fine if you set last line of file the same on both files. You can add further checks for readLine is null for one file or another.

void toTitleCase(Path path1, Path path2) {

try(BufferedReader in1= Files.newBufferedReader(path1);
    BufferedReader in2= Files.newBufferedReader(path2)) {
    String line1=in1.readLine(),line2=in2.readLine();
    int lastRead=0;
    List<String> previousLines=new ArrayList<>();
    while(line1!=null && line2!=null){
        if(line1.compareTo(line2)>0){
            if(lastRead==1) {
                System.out.println(previousLines+ " found in "+path1 +" but not in "+ path2);
                previousLines.clear();
            }
            previousLines.add(line2);
            line2=in2.readLine();
            lastRead = 2;
        } else if(line1.compareTo(line2)<0){
                if(lastRead==2) {
                    System.out.println(previousLines+ " found in "+path2 +" but not in "+ path1);
                    previousLines.clear();
                }
                previousLines.add(line1);
                line1=in1.readLine();
                    lastRead = 1;

            } else{
                if(lastRead==2) {
                    System.out.println(previousLines+ " found in "+path2 +" but not in "+ path1);
                }
                if(lastRead==1) {
                    System.out.println(previousLines+ " found in "+path1 +" but not in "+ path2);
                }
                previousLines.clear();
                line1=in1.readLine();
                line2=in2.readLine();
                lastRead=0;
            }
    }
} catch (IOException e) {
    e.printStackTrace();
}
    }

2 of 2

I thought this might be an interesting problem, so I put something together to illustrate how a difference application might work.

I had a file of words for a different application. So, I grabbed the first 100 words and reduced the size of each down to something I could test with easily.

Word List 1

aback
abandon
abandoned
abashed
abatement
abbey
abbot
abbreviate
abdomen
abducted
aberrant
aberration
abetted
abeyance

Word List 2

aardvark
aback
abacus
abandon
abatement
abbey
abbot
abbreviate
abdicate
abdomen
aberrant
aberration

My example application produces two different outputs. Here's the first output from my test run, the full difference output.

Differences between /word1.txt and /word2.txt
-----------------------------------------------------

------   Inserted   ----- | aardvark                 
aback                     | aback                    
------   Inserted   ----- | abacus                   
abandon                   | abandon                  
abandoned                 | ------   Deleted   ------
abashed                   | ------   Deleted   ------
abatement                 | abatement                
abbey                     | abbey                    
abbot                     | abbot                    
abbreviate                | abbreviate               
------   Inserted   ----- | abdicate                 
abdomen                   | abdomen                  
abducted                  | ------   Deleted   ------
aberrant                  | aberrant                 
aberration                | aberration               
abetted                   | ------   Deleted   ------
abeyance                  | ------   Deleted   ------

Now, for two really long files, where most of the text will match, this output would be hard to read. So, I also created an abbreviated output.

Differences between /word1.txt and /word2.txt
-----------------------------------------------------

------   Inserted   ----- | aardvark                 
---------------   1 line is the same   --------------
------   Inserted   ----- | abacus                   
---------------   1 line is the same   --------------
abandoned                 | ------   Deleted   ------
abashed                   | ------   Deleted   ------
--------------   4 lines are the same   -------------
------   Inserted   ----- | abdicate                 
---------------   1 line is the same   --------------
abducted                  | ------   Deleted   ------
--------------   2 lines are the same   -------------
abetted                   | ------   Deleted   ------
abeyance                  | ------   Deleted   ------

With these small test files, there's not much difference between the two reports.

With two large text files, the abbreviated report would be a lot easier to read.

Here's the example code.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

public class Difference {

    public static void main(String[] args) {
        String file1 = "/word1.txt";
        String file2 = "/word2.txt";

        try {
            new Difference().compareFiles(file1, file2);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private void compareFiles(String file1, String file2)
            throws IOException {
        int columnWidth = 25;
        int pageWidth = columnWidth + columnWidth + 3;
        boolean isFullReport = true;

        System.out.println(getTitle(file1, file2));
        System.out.println(getDashedLine(pageWidth));
        System.out.println();

        URL url1 = getClass().getResource(file1);
        URL url2 = getClass().getResource(file2);

        BufferedReader br1 = new BufferedReader(new InputStreamReader(
                url1.openStream()));
        BufferedReader br2 = new BufferedReader(new InputStreamReader(
                url2.openStream()));

        int countEqual = 0;
        String line1 = br1.readLine();
        String line2 = br2.readLine();

        while (line1 != null && line2 != null) {
            int result = line1.compareTo(line2);
            if (result == 0) {
                countEqual++;
                if (isFullReport) {
                    System.out.println(getFullEqualsLine(columnWidth,
                            line1, line2));
                }
                line1 = br1.readLine();
                line2 = br2.readLine();
            } else if (result < 0) {
                printEqualsLine(pageWidth, countEqual, isFullReport);
                countEqual = 0;
                System.out.println(getDifferenceLine(columnWidth,
                        line1, ""));
                line1 = br1.readLine();
            } else {
                printEqualsLine(pageWidth, countEqual, isFullReport);
                countEqual = 0;
                System.out.println(getDifferenceLine(columnWidth,
                        "", line2));
                line2 = br2.readLine();
            }
        }

        printEqualsLine(pageWidth, countEqual, isFullReport);

        while (line1 != null) {
            System.out.println(getDifferenceLine(columnWidth,
                    line1, ""));
            line1 = br1.readLine();
        }

        while (line2 != null) {
            System.out.println(getDifferenceLine(columnWidth,
                    "", line2));
            line2 = br2.readLine();
        }

        br1.close();
        br2.close();
    }

    private void printEqualsLine(int pageWidth, int countEqual,
            boolean isFullReport) {
        if (!isFullReport && countEqual > 0) {
            System.out.println(getEqualsLine(countEqual, pageWidth));
        }
    }

    private String getTitle(String file1, String file2) {
        return "Differences between " + file1 + " and " + file2;
    }

    private String getEqualsLine(int count, int length) {
        String lines = "lines are";
        if (count == 1) {
            lines = "line is";
        }
        String output = "   " + count + " " + lines +
                " the same   ";
        return getTextLine(length, output);
    }

    private String getFullEqualsLine(int columnWidth, String line1,
            String line2) {
        String format = "%-" + columnWidth + "s";
        return String.format(format, line1) + " | " +
            String.format(format, line2);
    }

    private String getDifferenceLine(int columnWidth, String line1,
            String line2) {
        String format = "%-" + columnWidth + "s";
        String deleted = getTextLine(columnWidth, "   Deleted   ");
        String inserted = getTextLine(columnWidth, "   Inserted   ");

        if (line1.isEmpty()) {
            return inserted + " | " + String.format(format, line2);
        } else {
            return String.format(format, line1) + " | " + deleted;
        }
    }

    private String getTextLine(int length, String output) {
        int half2 = (length - output.length()) / 2;
        int half1 = length - output.length() - half2;
        output = getDashedLine(half1) + output;
        output += getDashedLine(half2);
        return output;
    }

    private String getDashedLine(int count) {
        String output = "";
        for (int i = 0; i < count; i++) {
            output += "-";
        }
        return output;
    }

}

Its All Binary

Compare files side by side and hightlight diff using Java | Apache Commons Text diff | Myers algorithm - Its All Binary - Coding Posts, Examples, Projects & More

March 10, 2020 - Compare both files using Apache commons text & generate HTML output highlighting differences in both files. Create simple java main program to keep things simple.

TextCompare

textcompare.org › java

Online Java Compare Tool

Find difference between 2 text files. Just input or paste original and modified text and click Compare button. Fast, Private & Unlimited.

stackoverflow.com › questions › 68172616 › find-difference-of-two-large-files

java - Find difference of two large files - Stack Overflow