You've got the first case right. In the second case, only one a from aabc matches, so M = 1. In the third example, both as match so M = 2.

[P.S.: you're referring to the ancient Python 2.4 source code. The current source code is at hg.python.org.]

Answer from Fred Foo on Stack Overflow
🌐
TestDriven.io
testdriven.io › tips › 6de2820b-785d-4fc1-b107-ed8215528f49
Tips and Tricks - Python - Using SequenceMatcher.ratio() to find similarity between two strings | TestDriven.io
You can use difflib.SequenceMatcher.ratio() to get the distance between two strings: T - total number of elements in both strings (len(first_string) + len(second_string)) M - number of matches ·
🌐
Python
docs.python.org › 3 › library › difflib.html
difflib — Helpers for computing deltas
Source code: Lib/difflib.py This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences i...
🌐
Educative
educative.io › answers › what-is-sequencematcher-in-python
What is SequenceMatcher() in Python?
The ratio() function returns the similarity score (float in [0,1]) between input strings and sums the sizes of all matched sequences returned by the get_matching_blocks() function.
🌐
Beautiful Soup
tedboy.github.io › python_stdlib › generated › generated › difflib.SequenceMatcher.html
difflib.SequenceMatcher — Python Standard Library
Construct a SequenceMatcher. ... Set the two sequences to be compared. ... Set the first sequence to be compared. ... Set the second sequence to be compared. ... Find longest matching block in a[alo:ahi] and b[blo:bhi]. ... Return list of triples describing matching subsequences. ... Return list of 5-tuples describing how to turn a into b. ... Return a measure of the sequences’ similarity (float in [0,1]). ... Return an upper bound on .ratio() relatively quickly.
🌐
lxml
lxml.de › 3.1 › api › private › difflib.SequenceMatcher-class.html
difflib.SequenceMatcher
That may be because this is the ... Thread currentThread;", ... "private volatile Thread currentThread;") >>> .ratio() returns a float in [0, 1], measuring the "similarity" of the sequences....
🌐
GitHub
github.com › python › cpython › blob › main › Lib › difflib.py
cpython/Lib/difflib.py at main · python/cpython
Module difflib -- helpers for computing deltas between objects. · Function get_close_matches(word, possibilities, n=3, cutoff=0.6): Use SequenceMatcher to return list of the best "good enough" matches. · Function context_diff(a, b): For two lists of strings, return a delta in context diff format.
Author   python
Find elsewhere
🌐
SourceForge
epydoc.sourceforge.net › stdlib › difflib.SequenceMatcher-class.html
difflib.SequenceMatcher - Epydoc - SourceForge
That may be because this is the ... Thread currentThread;", ... "private volatile Thread currentThread;") >>> .ratio() returns a float in [0, 1], measuring the "similarity" of the sequences....
🌐
GeeksforGeeks
geeksforgeeks.org › python › compare-sequences-in-python-using-dfflib-module
Compare sequences in Python using dfflib module - GeeksforGeeks
February 24, 2021 - # import required module import difflib # assign parameters par1 = 'gfg' par2 = 'GFG' # compare print(difflib.SequenceMatcher(None, par1, par2).ratio())
🌐
Medium
medium.com › @zhangkd5 › a-tutorial-for-difflib-a-powerful-python-standard-library-to-compare-textual-sequences-096d52b4c843
A Tutorial of Difflib — A Powerful Python Standard Library to Compare Textual Sequences | by Kaidong Zhang | Medium
January 27, 2024 - from difflib import SequenceMatcher a = """The cat is sleeping on the red sofa.""" b = """The cat is sleeping on a blue sofa...""" seq_match = SequenceMatcher(None, a, b) ratio = seq_match.ratio() print(ratio) # Check the similarity of the two strings # The output similarity will be a decimal between 0 and 1, in our example it may output: # 0.821917808219178
🌐
Medium
ajinkya29.medium.com › what-is-difflib-41649066591c
What is Difflib?. So let's get started with this amazing… | by Ajinkya Mishrikotkar | Medium
June 14, 2021 - import difflib a = 'Medium' b = 'Median' seq = difflib.SequenceMatcher(None,a,b) d = seq.ratio()*100 print(d) 66.66666666666666
🌐
GitHub
github.com › seatgeek › fuzzywuzzy › issues › 128
Difflib and python-Levenshtein give different ratios in some cases · Issue #128 · seatgeek/fuzzywuzzy
August 12, 2016 - To show this, if we change the second sequence to "abaaaa", difflib will also score 67 (since it matches the first two characters of each sequence then recurses to the right). See as follows: >>> fuzz.ratio("ababab", "abaaaa") 67 #And switching pack to python-Levenshtein, no change: >>> fuzz.SequenceMatcher = fuzzywuzzy.StringMatcher.StringMatcher >>> fuzz.ratio("ababab", "abaaaa") 67
Author   theodickson
🌐
Beautiful Soup
tedboy.github.io › python_stdlib › generated › generated › difflib.SequenceMatcher.real_quick_ratio.html
difflib.SequenceMatcher.real_quick_ratio — Python Standard Library
difflib.SequenceMatcher.real_quick_ratio · View page source · SequenceMatcher.real_quick_ratio()[source]¶ · Return an upper bound on ratio() very quickly.
🌐
Python
bugs.python.org › issue31889
Issue 31889: difflib SequenceMatcher ratio() still have unpredictable behavior - Python tracker
This issue tracker has been migrated to GitHub, and is currently read-only. For more information, see the GitHub FAQs in the Python's Developer Guide · This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76070
🌐
CodeSpeedy
codespeedy.com › home › sequencematcher in python
SequenceMatcher in Python - CodeSpeedy
February 9, 2020 - The idea behind this is to find the longest matching subsequence which should be continued and compare it with full string and then get the ration as output. #import the class from difflib import SequenceMatcher s1 = "gun" s2 = "run" sequence ...
🌐
Runebook.dev
runebook.dev › en › docs › python › library › difflib › sequencematcher-examples
SequenceMatcher Secrets: Dealing with Junk, Speed, and Readable Diffs in Python
SequenceMatcher can be slow, especially when comparing two very long strings, as its complexity can approach O(N×M) in the worst-case scenario (where N and M are the lengths of the sequences). import difflib import time s1_long = "The quick brown fox jumps over the lazy dog " * 1000 s2_long = "The quick brown fox leaps over the sleepy dog " * 1000 # Using the full ratio (accurate but slow) start = time.time() sm = difflib.SequenceMatcher(None, s1_long, s2_long) full_ratio = sm.ratio() end = time.time() print(f"Full Ratio ({end-start:.4f}s): {full_ratio:.3f}") # Using a quicker ratio (faster but less accurate) start = time.time() quick_ratio = sm.quick_ratio() end = time.time() print(f"Quick Ratio ({end-start:.4f}s): {quick_ratio:.3f}")
Top answer
1 of 2
4

This gives some ideas of how matching works.

>>> import difflib
>>> 
>>> def print_matches(a, b):
...     s =  difflib.SequenceMatcher(None, a, b)
...     for block in s.get_matching_blocks():
...         print "a[%d] and b[%d] match for %d elements" % block
...     print s.ratio()
... 
>>> print_matches('01017', '14260')
a[0] and b[4] match for 1 elements
a[5] and b[5] match for 0 elements
0.2
>>> print_matches('14260', '01017')
a[0] and b[1] match for 1 elements
a[4] and b[2] match for 1 elements
a[5] and b[5] match for 0 elements
0.4

It looks as if it matches as much as it can on the first sequence against the second and continues from the matches. In this case ('01017', '14260'), the righthand match is on the 0, the last character, so no further matches on the right are possible. In this case ('14260', '01017'), the 1s match and the 0 still is available to match on the right, so two matches are found.

I think the matching algorithm is commutative against sorted sequences.

2 of 2
1

I was working with difflib lately, and though this answer is late, I thought it might add a little spice to the answer provided by hughdbrown as it shows what's happening visually.

Before I go to the code snippet, let me quote the documentation

The idea is to find the longest contiguous matching subsequence that contains no "junk" elements; these "junk" elements are ones that are uninteresting in some sense, such as blank lines or whitespace. (Handling junk is an extension to the Ratcliff and Obershelp algorithm.) The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.

I think comparing the first string against the second one and then finding matches looks right enough to people. This is explained nicely in the answer by hughdbrown.

Now try and run this code snippet:

def show_matching_blocks(a, b):
    s = SequenceMatcher(None, a, b)
    m = s.get_matching_blocks()
    seqs = [a, b]

    new_seqs = []
    for select, seq in enumerate(seqs):
        i, n = 0, 0
        new_seq = ''
        while i < len(seq):
            if i == m[n][select]:
                new_seq += '{' + seq[m[n][select]:m[n][select] + m[n].size] + '}'
                i += m[n].size
                n += 1
            elif i < m[n][select]:
                new_seq += seq[i:m[n][select]]
                i = m[n][select]
        new_seqs.append(new_seq)
    for seq, n in zip(seqs, new_seqs):
        print('{} --> {}'.format(seq, n))
    print('')

a, b = '10101789', '11426089'
show_matching_blocks(a, b)
show_matching_blocks(b, a)

Output:

10101789 --> {1}{0}1017{89}
11426089 --> {1}1426{0}{89}

11426089 --> {1}{1}426{0}{89}
10101789 --> {1}0{1}{0}17{89}

The parts inside braces ({}) are the matching parts. I just used SequenceMatcher.get_matching_blocks() to put the matching blocks within braces for better visibility. You can clearly see the difference when the order is reversed. With the first order, there are 4 matches, so the ratio is 2*4/16=0.5. But when the order is reversed, there are now 5 matches, so the ratio becomes 2*5/16=0.625. The ratio is calculated as given here in the documentation

🌐
Quora
quora.com › What-algorithm-is-Pythons-difflib-SequenceMatcher-based-on
What algorithm is Pythons' difflib SequenceMatcher based on? - Quora
Answer (1 of 2): According to difflib’s documentation, it is based on a variant of https://en.m.wikipedia.org/wiki/Gestalt_Pattern_Matching This algorithm calculates string similarity based on the length of the longest common subsequence and recursive lengths of common characters in other parts ...