Short answer: str slices, in general, copy. That means that your function that does a slice for each of your string's n suffixes is doing O(n2) work. That said, you can avoid copies if you can work with bytes-like objects using memoryviews to get zero-copy views of the original bytes data. See How you can do zero copy slicing below for how to make it work.
Long answer: (C)Python str do not slice by referencing a view of a subset of the data. There are exactly three modes of operation for str slicing:
- Complete slice, e.g.
mystr[:]: Returns a reference to the exact samestr(not just shared data, the same actual object,mystr is mystr[:]sincestris immutable so there is no risk to doing so) - The zero length slice and (implementation dependent) cached length 1 slices; the empty string is a singleton (
mystr[1:1] is mystr[2:2] is ''), and low ordinal strings of length one are cached singletons as well (on CPython 3.5.0, it looks like all characters representable in latin-1, that is Unicode ordinals inrange(256), are cached) - All other slices: The sliced
stris copied at creation time, and thereafter unrelated to the originalstr
The reason why #3 is the general rule is to avoid issues with large str being kept in memory by a view of a small portion of it. If you had a 1GB file, read it in and sliced it like so (yes, it's wasteful when you can seek, this is for illustration):
with open(myfile) as f:
data = f.read()[-1024:]
then you'd have 1 GB of data being held in memory to support a view that shows the final 1 KB, a serious waste. Since slices are usually smallish, it's almost always faster to copy on slice instead of create views. It also means str can be simpler; it needs to know its size, but it need not track an offset into the data as well.
How you can do zero copy slicing
There are ways to perform view based slicing in Python, and in Python 2, it will work on str (because str is bytes-like in Python 2, supporting the buffer protocol). With Py2 str and Py3 bytes (as well as many other data types like bytearray, array.array, numpy arrays, mmap.mmaps, etc.), you can create a memoryview that is a zero copy view of the original object, and can be sliced without copying data. So if you can use (or encode) to Py2 str/Py3 bytes, and your function can work with arbitrary bytes-like objects, then you could do:
def do_something_on_all_suffixes(big_string):
# In Py3, may need to encode as latin-1 or the like
remaining_suffix = memoryview(big_string)
# Rather than explicit loop, just replace view with one shorter view
# on each loop
while remaining_suffix: # Stop when we've sliced to empty view
some_constant_time_operation(remaining_suffix)
remaining_suffix = remaining_suffix[1:]
The slices of memoryviews do make new view objects (they're just ultra-lightweight with fixed size unrelated to the amount of data they view), just not any data, so some_constant_time_operation can store a copy if needed and it won't be changed when we slice it down later. Should you need a proper copy as Py2 str/Py3 bytes, you can call .tobytes() to get the raw bytes obj, or (in Py3 only it appears), decode it directly to a str that copies from the buffer, e.g. str(remaining_suffix[10:20], 'latin-1').
Recently I was thinking about interview questions I got as an undergrad:
Things like "reverse a string" and "check if a string is a palindrome".
I did most of these in C++ with a loop and scrolling through the index using logic.
When I learned Python, I realized that I could "reverse a string" by simply going:
return mystring[::-1]
Likewise with "check if it is a palindrome" by doing:
return mystring == mystring[::-1]
The problem now is that, I don't know what kinda complexity it is.
From my point of view, it is constant, so O(1). But I am guessing that that is too good to be true as the string splicing is doing something behind the scenes.
Can anyone help me clarify?
The python page on time-complexity shows that slicing lists has a time-complexity of O(k), where "k" is the length of the slice. That's for lists, not strings, but the complexity can't be O(1) for strings since the slicing must handle more characters as the size is increased. At a guess, the complexity of slicing strings would also be O(k). We can write a little bit of code to test that guess:
import time
StartSize = 2097152
size = StartSize
for _ in range(10):
# create string of size "size"
s = '*' * size
# now time reverse slice
start = time.time()
r = s[::-1]
delta = time.time() - start
print(f'Size {size:9d}, time={delta:.3f}')
# double size of the string
size *= 2
This uses a simple method of timing. Other tools exist, but this is simple. When run I get:
$ python3 test.py
Size 2097152, time=0.006
Size 4194304, time=0.013
Size 8388608, time=0.024
Size 16777216, time=0.050
Size 33554432, time=0.098
Size 67108864, time=0.190
Size 134217728, time=0.401
Size 268435456, time=0.808
Size 536870912, time=1.610
Size 1073741824, time=3.192
which shows the time doubles when doubling the size of the string for each reverse slice. So O(n) (k == n for whole-string slicing).
Edit: spelling.
How difficult an algorithm is to write and how difficult it is to calculate are two separate things. Creating a reversed string with the shorthand still requires n order space and n order time. Keep in mind that, in most cases, creating a reversed array isn't necessary, you can just start at the top and go down, which is essentially what Python's reversed() function does
Short answer: str slices, in general, copy. That means that your function that does a slice for each of your string's n suffixes is doing O(n2) work. That said, you can avoid copies if you can work with bytes-like objects using memoryviews to get zero-copy views of the original bytes data. See How you can do zero copy slicing below for how to make it work.
Long answer: (C)Python str do not slice by referencing a view of a subset of the data. There are exactly three modes of operation for str slicing:
- Complete slice, e.g.
mystr[:]: Returns a reference to the exact samestr(not just shared data, the same actual object,mystr is mystr[:]sincestris immutable so there is no risk to doing so) - The zero length slice and (implementation dependent) cached length 1 slices; the empty string is a singleton (
mystr[1:1] is mystr[2:2] is ''), and low ordinal strings of length one are cached singletons as well (on CPython 3.5.0, it looks like all characters representable in latin-1, that is Unicode ordinals inrange(256), are cached) - All other slices: The sliced
stris copied at creation time, and thereafter unrelated to the originalstr
The reason why #3 is the general rule is to avoid issues with large str being kept in memory by a view of a small portion of it. If you had a 1GB file, read it in and sliced it like so (yes, it's wasteful when you can seek, this is for illustration):
with open(myfile) as f:
data = f.read()[-1024:]
then you'd have 1 GB of data being held in memory to support a view that shows the final 1 KB, a serious waste. Since slices are usually smallish, it's almost always faster to copy on slice instead of create views. It also means str can be simpler; it needs to know its size, but it need not track an offset into the data as well.
How you can do zero copy slicing
There are ways to perform view based slicing in Python, and in Python 2, it will work on str (because str is bytes-like in Python 2, supporting the buffer protocol). With Py2 str and Py3 bytes (as well as many other data types like bytearray, array.array, numpy arrays, mmap.mmaps, etc.), you can create a memoryview that is a zero copy view of the original object, and can be sliced without copying data. So if you can use (or encode) to Py2 str/Py3 bytes, and your function can work with arbitrary bytes-like objects, then you could do:
def do_something_on_all_suffixes(big_string):
# In Py3, may need to encode as latin-1 or the like
remaining_suffix = memoryview(big_string)
# Rather than explicit loop, just replace view with one shorter view
# on each loop
while remaining_suffix: # Stop when we've sliced to empty view
some_constant_time_operation(remaining_suffix)
remaining_suffix = remaining_suffix[1:]
The slices of memoryviews do make new view objects (they're just ultra-lightweight with fixed size unrelated to the amount of data they view), just not any data, so some_constant_time_operation can store a copy if needed and it won't be changed when we slice it down later. Should you need a proper copy as Py2 str/Py3 bytes, you can call .tobytes() to get the raw bytes obj, or (in Py3 only it appears), decode it directly to a str that copies from the buffer, e.g. str(remaining_suffix[10:20], 'latin-1').
It all depends on how big your slices are. I threw together the following two benchmarks. The first slices the entire string and the second only a little bit. Curve fitting with this tool gives
# s[1:-1]
y = 0.09 x^2 + 10.66 x - 3.25
# s[1:1000]
y = -0.15 x + 17.13706461
The first looks quite linear for slices of strings up to 4MB. I guess this really measures the time taken to construct a second string. The second is pretty constant, although it's so fast it's probably not that stable.


import time
def go(n):
start = time.time()
s = "abcd" * n
for j in xrange(50000):
#benchmark one
a = s[1:-1]
#benchmark two
a = s[1:1000]
end = time.time()
return (end - start) * 1000
for n in range(1000, 100000, 5000):
print n/1000.0, go(n)
python - What is the time complexity of string slice? O(k) or O(n) - Stack Overflow
algorithm analysis - Python: A doubt on time and space complexity on string slicing - Computer Science Stack Exchange
Time complexity of string.sub(str, i, i)?
Time complexity of python slice operation - 💡-string-window - Coding Blocks Discussion Forum
The time complexity is O(N) on average, O(NM) worst case (N being the length of the longer string, M, the shorter string you search for). As of Python 3.10, heuristics are used to lower the worst-case scenario to O(N + M) by switching algorithms.
The same algorithm is used for str.index(), str.find(), str.__contains__() (the in operator) and str.replace(); it is a simplification of the Boyer-Moore with ideas taken from the Boyer–Moore–Horspool and Sunday algorithms.
See the original stringlib discussion post, as well as the fastsearch.h source code; until Python 3.10, the base algorithm has not changed since introduction in Python 2.5 (apart from some low-level optimisations and corner-case fixes).
The post includes a Python-code outline of the algorithm:
def find(s, p): # find first occurrence of p in s n = len(s) m = len(p) skip = delta1(p)[p[m-1]] i = 0 while i <= n-m: if s[i+m-1] == p[m-1]: # (boyer-moore) # potential match if s[i:i+m-1] == p[:m-1]: return i if s[i+m] not in p: i = i + m + 1 # (sunday) else: i = i + skip # (horspool) else: # skip if s[i+m] not in p: i = i + m + 1 # (sunday) else: i = i + 1 return -1 # not found
as well as speed comparisons.
In Python 3.10, the algorithm was updated to use an enhanced version of the Crochemore and Perrin's Two-Way string searching algorithm for larger problems (with p and s longer than 100 and 2100 characters, respectively, with s at least 6 times as long as p), in response to a pathological edgecase someone reported. The commit adding this change included a write-up on how the algorithm works.
The Two-way algorithm has a worst-case time complexity of O(N + M), where O(M) is a cost paid up-front to build a shift table from the s search needle. Once you have that table, this algorithm does have a best-case performance of O(N/M).
In Python 3.4.2, it looks like they are resorting to the same function, but there may be a difference in timing nevertheless. For example, s.find first is required to look up the find method of the string and such.
The algorithm used is a mix between Boyer-More and Horspool.
Your algorithm (in terms of number of comparisons) is O(n), where n is length of the string. In the worst case both string and pattern will be the same and then for every character in subStr you will move to next character of string. It'll be equivalent to simple comparison of strings.
However your implementation may be O(n^2) in terms of other operations and the reason for this, as you mentioned in you question, is the following line:
fn = fn[index+1 ::]
This is effectively copying the string (assuming the slice above is implemented as a copy). If you consider previous example again, for every character in a string you'd have to copy all remaining characters, which is O(n^2). This is because you'll be copying n-1 characters first, then n-2, n-3 and so on, and at the last iteration you will copy just one character. Total amount of items to be copied will be then n-1+n-2+...+1, which, as the arithmetic progression, is equal to (n-1)*((n-1)+1)/2 = (n-1)*n/2 = O(n^2). For other situations this could be generalised to O(m*n), where m is length of the pattern.
What your friend might like to tell you was: your algorithm is linear, but your implementation is not. It can be easily solved though. Use solution presented by @thkang or something more transparent to get rid of hidden complexity, for example:
try:
si = iter(string)
for c in subStr:
while c != si.next():
pass
except StopIteration:
print "no match"
else:
print "match"
Sorry to say but this is neither O(n) nor O(n+m). Both of them are better than O(n^2) and this algorithm is O(n^2). Why?
- You iterate over source string which has O(n) complexity
- In each iteration you call "index" on string which also has O(n) complexity
So this has worst case performance bounded by O(n^2) and is in fact an implementation of naive search algorithm. If you want to see O(n)/O(mn) I advise checking out Boyer-Moore algorithm