Yes, in your case*1 string concatenation requires all characters to be copied, this is a O(N+M) operation (where N and M are the sizes of the input strings). M appends of the same word will trend to O(M^2) time therefor.
You can avoid this quadratic behaviour by using str.join():
word = ''.join(list_of_words)
which only takes O(N) (where N is the total length of the output). Or, if you are repeating a single character, you can use:
word = m * char
You are prepending characters, but building a list first, then reversing it (or using a collections.deque() object to get O(1) prepending behaviour) would still be O(n) complexity, easily beating your O(N^2) choice here.
*1 As of Python 2.4, the CPython implementation avoids creating a new string object when using strA += strB or strA = strA + strB, but this optimisation is both fragile and not portable. Since you use strA = strB + strA (prepending) the optimisation doesn't apply.
Algorithm complexity with strings and slices
The python page on time-complexity shows that slicing lists has a time-complexity of O(k), where "k" is the length of the slice. That's for lists, not strings, but the complexity can't be O(1) for strings since the slicing must handle more characters as the size is increased. At a guess, the complexity of slicing strings would also be O(k). We can write a little bit of code to test that guess:
import time
StartSize = 2097152
size = StartSize
for _ in range(10):
# create string of size "size"
s = '*' * size
# now time reverse slice
start = time.time()
r = s[::-1]
delta = time.time() - start
print(f'Size {size:9d}, time={delta:.3f}')
# double size of the string
size *= 2This uses a simple method of timing. Other tools exist, but this is simple. When run I get:
$ python3 test.py Size 2097152, time=0.006 Size 4194304, time=0.013 Size 8388608, time=0.024 Size 16777216, time=0.050 Size 33554432, time=0.098 Size 67108864, time=0.190 Size 134217728, time=0.401 Size 268435456, time=0.808 Size 536870912, time=1.610 Size 1073741824, time=3.192
which shows the time doubles when doubling the size of the string for each reverse slice. So O(n) (k == n for whole-string slicing).
Edit: spelling.
More on reddit.comPython string 'in' operator implementation algorithm and time complexity - Stack Overflow
algorithm analysis - Python: A doubt on time and space complexity on string slicing - Computer Science Stack Exchange
python - What is the time complexity of string slice? O(k) or O(n) - Stack Overflow
Videos
I am trying to determine the time complexity of the encode function below. It goes over every string in the input list, so that's O(n), where n is the length of the input list. But in each iteration, we add to the encodedStr. Since python creates a new string every time the add operation is performed, do we need to take the length of encodedStr into account for the time complexity of the function?
class Codec:
def encode(self, strs: List[str]) -> str:
# Encodes a list of strings to a single string.
encodedStr = ''
for s in strs
encodedStr += str(len(s)) + '#' + s
return encodedStr
Recently I was thinking about interview questions I got as an undergrad:
Things like "reverse a string" and "check if a string is a palindrome".
I did most of these in C++ with a loop and scrolling through the index using logic.
When I learned Python, I realized that I could "reverse a string" by simply going:
return mystring[::-1]
Likewise with "check if it is a palindrome" by doing:
return mystring == mystring[::-1]
The problem now is that, I don't know what kinda complexity it is.
From my point of view, it is constant, so O(1). But I am guessing that that is too good to be true as the string splicing is doing something behind the scenes.
Can anyone help me clarify?
The python page on time-complexity shows that slicing lists has a time-complexity of O(k), where "k" is the length of the slice. That's for lists, not strings, but the complexity can't be O(1) for strings since the slicing must handle more characters as the size is increased. At a guess, the complexity of slicing strings would also be O(k). We can write a little bit of code to test that guess:
import time
StartSize = 2097152
size = StartSize
for _ in range(10):
# create string of size "size"
s = '*' * size
# now time reverse slice
start = time.time()
r = s[::-1]
delta = time.time() - start
print(f'Size {size:9d}, time={delta:.3f}')
# double size of the string
size *= 2
This uses a simple method of timing. Other tools exist, but this is simple. When run I get:
$ python3 test.py
Size 2097152, time=0.006
Size 4194304, time=0.013
Size 8388608, time=0.024
Size 16777216, time=0.050
Size 33554432, time=0.098
Size 67108864, time=0.190
Size 134217728, time=0.401
Size 268435456, time=0.808
Size 536870912, time=1.610
Size 1073741824, time=3.192
which shows the time doubles when doubling the size of the string for each reverse slice. So O(n) (k == n for whole-string slicing).
Edit: spelling.
How difficult an algorithm is to write and how difficult it is to calculate are two separate things. Creating a reversed string with the shorthand still requires n order space and n order time. Keep in mind that, in most cases, creating a reversed array isn't necessary, you can just start at the top and go down, which is essentially what Python's reversed() function does
It's a combination of Boyer-Moore and Horspool.
You can view the C code here:
Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.
From the link above:
When designing the new algorithm, I used the following constraints:
- should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
- small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
- sublinear search behaviour in good cases (O(n/m))
- no worse than the current algorithm in worst case (O(nm))
- should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
- many real-life searches should be good, very few should be worst case
- reasonably simple implementation
Since 2021, CPython uses Crochemore and Perrin's Two-Way algorithm for larger n.
From the source code:
If the strings are long enough, use Crochemore and Perrin's Two-Way algorithm, which has worst-case O(n) runtime and best-case O(n/k). Also compute a table of shifts to achieve O(n/k) in more cases, and often (data dependent) deduce larger shifts than pure C&P can deduce. See stringlib_find_two_way_notes.txt in this folder for a detailed explanation.
See https://github.com/python/cpython/pull/22904
As can be seen from the source code, the implementation of int.__str__ has runtime complexity of O(m*n) where m is the number of binary digits and n is the number of decimal digits. Since for an integer i the number of digits in an arbitrary base b is given by log(i, base=b) and logarithms in different bases differ only by a constant, the runtime complexity is O(log(i)**2), i.e. quadratic in the number of digits.
This can be verified by running a performance test:
import perfplot
perfplot.show(
setup=lambda n: 10**n,
kernels=[str],
n_range=range(1, 1001, 10),
xlabel='number of digits',
)

The quadratic time complexity in the number of digits is also mentioned in the issue for CVE-2020-10735:
[...] A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.
O(n) in the context of a data structure just means if there are n items then an operation on that structure will require (in the order of) n iterations or passes to achieve the desired result. If you're constructing a string from an integer, then I guess the complexity would be O(log10(n))
EDIT: from the Python docs:
If neither encoding nor errors is given, str(object) returns object.str(), which is the “informal” or nicely printable string representation of object.
Python detects that the object is an int, therefore it will create a string from that int. One way of implementing this conversion is:
if n == 0:
return "0"
negative = n < 0
if negative:
n = -n
out = ""
while n > 0:
digit = n % 10
n /= 10
out = chr(48 + digit) + out
if negative:
out = "-" + out
The number of iterations inside that while loop depends on the number of digits in decimal that n contains, which is log10(n).