I know that lists in Python are implemented using arrays that store addresses to the information. Therefore, after several appends, when an array is loaded, it needs to reserve a new space and copy the entire array of addresses to the new place.
I've read on Stackoverflow that in Python Array doubles in size when run of space. So basically it has to copy all addresses log(n) times.
The bigger the list, the more copying it will need to do. So how can append operation have a Constant Time Complexity O(1) if it has some dependence on the array size
I assume since it copies addresses, not information, it shouldn't take long, python takes only 8 bytes for address after all. Moreover, it does so very rarely. Does that mean that O(1) is the average time complexity? Is my assumption right?
It's amortized O(1), not O(1).
Let's say the list reserved size is 8 elements and it doubles in size when space runs out. You want to push 50 elements.
The first 8 elements push in O(1). The nineth triggers reallocation and 8 copies, followed by an O(1) push. The next 7 push in O(1). The seventeenth triggers reallocation and 16 copies, followed by an O(1) push. The next 15 push in O(1). The thirty-third triggers reallocation and 32 copies, followed by an O(1) push. The next 31 push in O(1). This continues as the size of list is doubled again at pushing the 65th, 129th, 257th element, etc..
So all of the pushes have O(1) complexity, we had 64 copies at O(1), and 3 reallocations at O(n), with n = 8, 16, and 32. Note that this is a geometric series and asymptotically equals O(n) with n = the final size of the list. That means the whole operation of pushing n objects onto the list is O(n). If we amortize that per element, it's O(n)/n = O(1).
If you look at the footnote in the document you linked, you can see that they include a caveat:
These operations rely on the "Amortized" part of "Amortized Worst Case". Individual actions may take surprisingly long, depending on the history of the container.
Using amortized analysis, even if we have to occasionally perform expensive operations, we can get a lower bound on the 'average' cost of operations when you consider them as a sequence, instead of individually.
So, any individual operation could be very expensive - O(n) or O(n^2) or something even bigger - but since we know these operations are rare, we guarantee that a sequence of O(n) operations can be done in O(n) time.