In case you're interested in a quick visual comparison of Levenshtein and Difflib similarity, I calculated both for ~2.3 million book titles:

import codecs, difflib, Levenshtein, distance

with codecs.open("titles.tsv","r","utf-8") as f:
    title_list = f.read().split("\n")[:-1]

    for row in title_list:

        sr      = row.lower().split("\t")

        diffl   = difflib.SequenceMatcher(None, sr[3], sr[4]).ratio()
        lev     = Levenshtein.ratio(sr[3], sr[4]) 
        sor     = 1 - distance.sorensen(sr[3], sr[4])
        jac     = 1 - distance.jaccard(sr[3], sr[4])

        print diffl, lev, sor, jac

I then plotted the results with R:

Strictly for the curious, I also compared the Difflib, Levenshtein, Sørensen, and Jaccard similarity values:

library(ggplot2)
require(GGally)

difflib <- read.table("similarity_measures.txt", sep = " ")
colnames(difflib) <- c("difflib", "levenshtein", "sorensen", "jaccard")

ggpairs(difflib)

Result:

The Difflib / Levenshtein similarity really is quite interesting.

2018 edit: If you're working on identifying similar strings, you could also check out minhashing--there's a great overview here. Minhashing is amazing at finding similarities in large text collections in linear time. My lab put together an app that detects and visualizes text reuse using minhashing here: https://github.com/YaleDHLab/intertext

Answer from duhaime on Stack Overflow
Top answer
1 of 2
241

In case you're interested in a quick visual comparison of Levenshtein and Difflib similarity, I calculated both for ~2.3 million book titles:

import codecs, difflib, Levenshtein, distance

with codecs.open("titles.tsv","r","utf-8") as f:
    title_list = f.read().split("\n")[:-1]

    for row in title_list:

        sr      = row.lower().split("\t")

        diffl   = difflib.SequenceMatcher(None, sr[3], sr[4]).ratio()
        lev     = Levenshtein.ratio(sr[3], sr[4]) 
        sor     = 1 - distance.sorensen(sr[3], sr[4])
        jac     = 1 - distance.jaccard(sr[3], sr[4])

        print diffl, lev, sor, jac

I then plotted the results with R:

Strictly for the curious, I also compared the Difflib, Levenshtein, Sørensen, and Jaccard similarity values:

library(ggplot2)
require(GGally)

difflib <- read.table("similarity_measures.txt", sep = " ")
colnames(difflib) <- c("difflib", "levenshtein", "sorensen", "jaccard")

ggpairs(difflib)

Result:

The Difflib / Levenshtein similarity really is quite interesting.

2018 edit: If you're working on identifying similar strings, you could also check out minhashing--there's a great overview here. Minhashing is amazing at finding similarities in large text collections in linear time. My lab put together an app that detects and visualizes text reuse using minhashing here: https://github.com/YaleDHLab/intertext

2 of 2
137
  • difflib.SequenceMatcher uses the Ratcliff/Obershelp algorithm it computes the doubled number of matching characters divided by the total number of characters in the two strings.

  • Levenshtein uses Levenshtein algorithm it computes the minimum number of edits needed to transform one string into the other

Complexity

SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common. (from here)

Levenshtein is O(m*n), where n and m are the length of the two input strings.

Performance

According to the source code of the Levenshtein module : Levenshtein has a some overlap with difflib (SequenceMatcher). It supports only strings, not arbitrary sequence types, but on the other hand it's much faster.

🌐
GitHub
github.com › seatgeek › fuzzywuzzy › issues › 128
Difflib and python-Levenshtein give different ratios in some cases · Issue #128 · seatgeek/fuzzywuzzy
August 12, 2016 - I'm assuming that in python-Levenshtein, it makes a decision about which matches to choose based not just on maximality and left-most-ness, but also on whether the pair of matches chosen allows subsequent recursion. This is because it scores 67, so must have matched eight of 12 characters, i.e.
Author   theodickson
🌐
Uni-saarland
coli.uni-saarland.de › courses › LT1 › 2011 › slides › Python-Levenshtein.html
Uni-saarland
It takes some work to change spam to Levenshtein. opcodes(source_string, destination_string) opcodes(edit_operations, source_length, destination_length) Find sequence of edit operations transforming one string to another. The result is a list of 5-tuples with the same meaning as in SequenceMatcher'...
🌐
Kaggle
kaggle.com › code › tcarloscn › faster-sequencematcher-levenshtein
Faster SequenceMatcher & Levenshtein
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Rapidfuzz
rapidfuzz.github.io › Levenshtein
Levenshtein 0.27.3 documentation
Levenshtein has a some overlap with difflib (SequenceMatcher).
🌐
Typesense
typesense.org › posts › fuzzy string matching in python (with examples)
Fuzzy string matching in Python (with examples) | Typesense
The difflib module contains many ... explore further. The Levenshtein distance between two strings is the number of deletions, insertions and substitutions needed to transform one string into another....
🌐
Gree2
gree2.github.io › python › 2017 › 03 › 07 › string-comparison-in-python
string comparison in python
from difflib import SequenceMatcher def similar(a, b): return SequenceMatcher(None, a, b).ratio() >>> similar("Apple","Appel") 0.8 >>> similar("Apple","Mango") 0.0 · reference Fuzzy string comparison in Python, confused with which library to use [closed] ... import Levenshtein Levenshtein.ratio('hello world', 'hello') Result: 0.625 import difflib difflib.SequenceMatcher(None, 'hello world', 'hello').ratio() Result: 0.625
🌐
Medium
medium.com › @user1337 › exploring-string-matching-and-diffing-algorithms-and-libraries-in-python-96461fbd28fc
Exploring String Matching and Diffing Algorithms and Libraries in Python | by User | Medium
January 25, 2023 - It is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other. The Levenshtein distance algorithm can be implemented in Python using the distance function from the Levenshtein library.
Find elsewhere
🌐
Runebook.dev
runebook.dev › en › docs › python › library › difflib › difflib.SequenceMatcher
python - SequenceMatcher Explained: Word-Level Diffs, Junk Handling, and Fuzzy Matching
Why it's Different difflib is focused on finding the longest match. Levenshtein distance is focused on the minimum number of edits. Alternative Code Example Using python-Levenshtein (external library)
🌐
GitHub
github.com › ztane › python-Levenshtein › wiki
Home · ztane/python-Levenshtein Wiki · GitHub
December 9, 2014 - It takes some work to change spam to Levenshtein. opcodes(source_string, destination_string) opcodes(edit_operations, source_length, destination_length) Find sequence of edit operations transforming one string to another. The result is a list of 5-tuples with the same meaning as in SequenceMatcher'...
Author   ztane
Top answer
1 of 2
7

FuzzyWuzzy.ratio using python-Levenshtein doesn't return the Levenshtein score, but rather the Levenshtein ratio, which is (a+b - LevenshteinScore)/(a+b), where a and b are the lengths of the two strings being compared.

If you don't have python-Levenshtein installed then fuzzywuzzy doesn't use Levenshtein at all. Fuzzywuzzy's home page is misleading with regards to this, though it does recommend installing python-Levenshtein.

python-Levenshtein has some issues with installing; I used the second response to this stackoverflow question to solve it.

If you don't have python-Levenshtein installed FuzzyWuzzy uses difflib instead, which is the same for many input values, but not all. The developers recommend using python-Levenshtein. See this issue on fuzzywuzzy's git, which includes an example case where the results are different with the package as compared to without it. This probably shouldn't happen, or at least the documentation should make this clear, but FuzzyWuzzy's Devs seem content at least with the functionality.

2 of 2
1

Found an excellent article from the creator of FuzzyWuzzy here.

String Similarity The simplest way to compare two strings is with a measurement of edit distance. For example, the following two strings are quite similar: NEW YORK METS NEW YORK MEATS Looks like a harmless misspelling. Can we quantify it? Using python’s difflib, that’s pretty easy

from difflib import SequenceMatcher 
m = SequenceMatcher(None,"NEW YORK METS", "NEW YORK MEATS") 
m.ratio() ⇒ 0.962962962963 

So it looks like these two strings are about 96% the same. Pretty good! We use this pattern so frequently, we wrote a helper method to encapsulate it

fuzz.ratio("NEW YORK METS", "NEW YORK MEATS") ⇒ 96
🌐
Readthedocs
sphobjinv.readthedocs.io › en › latest › levenshtein.html
Speeding up “suggest” with python-Levenshtein (DEPRECATED) — sphobjinv 2.3.2.dev0 documentation
Speeding up “suggest” with python-Levenshtein (DEPRECATED) ... sphobjinv uses fuzzywuzzy for fuzzy-match searching of object names/domains/roles as part of the Inventory.suggest() functionality, also implemented as the CLI suggest subcommand. fuzzywuzzy uses difflib.SequenceMatcher from the Python standard library for its fuzzy searching.
🌐
Medium
medium.com › @r.manov › the-two-algorithms-every-developer-uses-wrong-a-visual-guide-to-fuzzy-string-matching-5d97961bb465
The Two Algorithms Every Developer Uses Wrong: A Visual Guide to Fuzzy String Matching | by Ruslan Manov | Feb, 2026 | Medium
February 2, 2026 - I used `difflib.SequenceMatcher` because it’s in Python’s standard library. It worked well — until a colleague pointed out that our typo detection was weak. “Programing” wasn’t being caught as a misspelling of “programming” because the threshold was tuned for the longer catalog strings. When I tried Levenshtein distance instead, the typo detection improved but the catalog matching fell apart.
🌐
Lightrun
lightrun.com › answers › gutfeeling-word_forms-use-difflib-instead-of-python-levenshtein-for-computing-similarity-ratio
Use difflib instead of python-Levenshtein for computing similarity ratio
In fact, there is some overlap between the actual implementations of these libraries, as mentioned in the python-Levenshtein docs. ... from difflib import SequenceMatcher from Levenshtein import ratio def sequence_matcher_ratio(a, b): return SequenceMatcher(None, a, b).ratio() def compare_equality(a, b): print(sequence_matcher_ratio(a, b) == ratio(a, b)) def compare_print(a, b): print("Sequence Matcher Ratio: ", sequence_matcher_ratio(a, b)) print("Levenshtein Ratio: ", ratio(a, b)) >>> compare_equality('continent', 'continence') True >>> compare_print('continent', 'continence') Sequence Matcher Ratio: 0.8421052631578947 Levenshtein Ratio: 0.8421052631578947
🌐
PyPI
pypi.org › project › python-Levenshtein
python-Levenshtein · PyPI
Python extension for computing string edit distances and similarities. ... :warning: The package was renamed to Levenshtein and can be found here.
      » pip install python-Levenshtein
    
Published   Nov 01, 2025
Version   0.27.3
🌐
GitHub
github.com › faircloth-lab › python-levenshtein
GitHub - faircloth-lab/python-levenshtein: The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
It misses some SequenceMatcher's functionality, and has some extra OTOH. Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension.
Author   faircloth-lab
🌐
GitHub
github.com › polm › levenshtein
GitHub - polm/levenshtein: The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
It misses some SequenceMatcher's functionality, and has some extra OTOH. Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension.
Author   polm
🌐
Rapidfuzz
rapidfuzz.github.io › Levenshtein › levenshtein.html
Levenshtein module - Levenshtein 0.27.3 documentation
The result is a list of 5-tuples with the same meaning as in SequenceMatcher’s get_opcodes() output. But since the algorithms differ, the actual sequences from Levenshtein and SequenceMatcher may differ too.