Assuming ASCII strings:
string1 = 'Hello'
string2 = 'hello'
if string1.lower() == string2.lower():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
As of Python 3.3, casefold() is a better alternative:
string1 = 'Hello'
string2 = 'hello'
if string1.casefold() == string2.casefold():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
If you want a more comprehensive solution that handles more complex unicode comparisons, see other answers.
Answer from Harley Holcombe on Stack OverflowAssuming ASCII strings:
string1 = 'Hello'
string2 = 'hello'
if string1.lower() == string2.lower():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
As of Python 3.3, casefold() is a better alternative:
string1 = 'Hello'
string2 = 'hello'
if string1.casefold() == string2.casefold():
print("The strings are the same (case insensitive)")
else:
print("The strings are NOT the same (case insensitive)")
If you want a more comprehensive solution that handles more complex unicode comparisons, see other answers.
Comparing strings in a case insensitive way seems trivial, but it's not. I will be using Python 3, since Python 2 is underdeveloped here.
The first thing to note is that case-removing conversions in Unicode aren't trivial. There is text for which text.lower() != text.upper().lower(), such as "ß":
>>> "ß".lower()
'ß'
>>> "ß".upper().lower()
'ss'
But let's say you wanted to caselessly compare "BUSSE" and "Buße". Heck, you probably also want to compare "BUSSE" and "BUẞE" equal - that's the newer capital form. The recommended way is to use casefold:
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used for caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. [...]
Do not just use lower. If casefold is not available, doing .upper().lower() helps (but only somewhat).
Then you should consider accents. If your font renderer is good, you probably think "ê" == "ê" - but it doesn't:
>>> "ê" == "ê"
False
This is because the accent on the latter is a combining character.
>>> import unicodedata
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']
The simplest way to deal with this is unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does
>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True
To finish up, here this is expressed in functions:
import unicodedata
def normalize_caseless(text):
return unicodedata.normalize("NFKD", text.casefold())
def caseless_equal(left, right):
return normalize_caseless(left) == normalize_caseless(right)
Videos
So I have a script which replaces a set of strings in a text file, but it needs to be case sensitive, is this a built in function or do I need to do some black magic
Here is a benchmark showing that using str.lower is faster than the accepted answer's proposed method (libc.strcasecmp):
Copy#!/usr/bin/env python2.7
import random
import timeit
from ctypes import *
libc = CDLL('libc.dylib') # change to 'libc.so.6' on linux
with open('/usr/share/dict/words', 'r') as wordlist:
words = wordlist.read().splitlines()
random.shuffle(words)
print '%i words in list' % len(words)
setup = 'from __main__ import words, libc; gc.enable()'
stmts = [
('simple sort', 'sorted(words)'),
('sort with key=str.lower', 'sorted(words, key=str.lower)'),
('sort with cmp=libc.strcasecmp', 'sorted(words, cmp=libc.strcasecmp)'),
]
for (comment, stmt) in stmts:
t = timeit.Timer(stmt=stmt, setup=setup)
print '%s: %.2f msec/pass' % (comment, (1000*t.timeit(10)/10))
typical times on my machine:
Copy235886 words in list
simple sort: 483.59 msec/pass
sort with key=str.lower: 1064.70 msec/pass
sort with cmp=libc.strcasecmp: 5487.86 msec/pass
So, the version with str.lower is not only the fastest by far, but also the most portable and pythonic of all the proposed solutions here.
I have not profiled memory usage, but the original poster has still not given a compelling reason to worry about it. Also, who says that a call into the libc module doesn't duplicate any strings?
NB: The lower() string method also has the advantage of being locale-dependent. Something you will probably not be getting right when writing your own "optimised" solution. Even so, due to bugs and missing features in Python, this kind of comparison may give you wrong results in a unicode context.
Your question implies that you don't need Unicode. Try the following code snippet; if it works for you, you're done:
CopyPython 2.5.2 (r252:60911, Aug 22 2008, 02:34:17)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_COLLATE, "en_US")
'en_US'
>>> sorted("ABCabc", key=locale.strxfrm)
['a', 'A', 'b', 'B', 'c', 'C']
>>> sorted("ABCabc", cmp=locale.strcoll)
['a', 'A', 'b', 'B', 'c', 'C']
Clarification: in case it is not obvious at first sight, locale.strcoll seems to be the function you need, avoiding the str.lower or locale.strxfrm "duplicate" strings.
Hi all,
I'm learning python and have been trying to figure out how to properly compare lists for case insensitivity. If I have two lists, where the first list contains the current user names and the second list is a list of new usernames, how do I get to make sure that if a new user name John won't conflict with a username in the current users of JoHn or JOHN and vice versa?
I have this so far:
current_users = ['John', 'BiLl', 'simcitizzon', 'mIke', 'cHarlie', 'admin']
new_users = ['john', 'simcitiZzon', 'ralphwiggum', 'cherrymcsperry', 'sweettooth347']
for user in new_users:
if user in current_users:
print("Sorry, " + user + " is taken.")
else:
print(user + ", this username is available")