That is correct. From C99 §6.5.2.1/2:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
There's no magic. It's a 1-1 equivalence. As always when dereferencing a pointer (*), you need to be sure it's pointing to a valid address.
Answer from Matthew Flaschen on Stack OverflowThat is correct. From C99 §6.5.2.1/2:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
There's no magic. It's a 1-1 equivalence. As always when dereferencing a pointer (*), you need to be sure it's pointing to a valid address.
This is only valid if arr is a pointer that points to the second element in an array or a later element. Otherwise, it is not valid, because you would be accessing memory outside the bounds of the array. So, for example, this would be wrong:
int arr[10];
int x = arr[-2]; // invalid; out of range
But this would be okay:
int arr[10];
int* p = &arr[2];
int x = p[-2]; // valid: accesses arr[0]
It is, however, unusual to use a negative subscript.
The array indexing operation a[i] gains its meaning from the following features of C
The syntax
a[i]is equivalent to*(a + i). Thus it is valid to say5[a]to get at the 5th element ofa.Pointer-arithmetic says that given a pointer
pand an integeri,p + ithe pointerpadvanced byi * sizeof(*p)bytesThe name of an array
avery quickly devolves to a pointer to the 0th element ofa
In effect, array-indexing is a special case of pointer-indexing. Since a pointer can point to any place inside an array, any arbitrary expression that looks like p[-1] is not wrong by examination, and so compilers don't (can't) consider all such expressions as errors.
Your example a[-1] where a is actually the name of an array is actually invalid. IIRC, it is undefined if there's a meaningful pointer value as the result of the expression a - 1 where a is know to be a pointer to the 0th element of an array. So, a clever compiler could detect this and flag it as an error. Other compilers can still be compliant while allowing you to shoot yourself in the foot by giving you a pointer to a random stack slot.
The computer science answer is:
In C, the
[]operator is defined on pointers, not arrays. In particular, it's defined in terms of pointer arithmetic and pointer dereference.In C, a pointer is abstractly a tuple
(start, length, offset)with the condition that0 <= offset <= length. Pointer arithmetic is essentially lifted arithmetic on the offset, with the caveat that if the result of the operation violates the pointer condition, it is an undefined value. De-referencing a pointer adds an additional constraint thatoffset < length.C has a notion of
undefined behaviourwhich allows a compiler to concretely represent that tuple as a single number, and not have to detect any violations of the pointer condition. Any program that satisfies the abstract semantics will be safe with the concrete (lossy) semantics. Anything that violates the abstract semantics can be, without comment, accepted by the compiler and it can do anything it wants to do with it.
Arrays are simply laid out as contiguous chunks of memory. An array access such as a[i] is converted to an access to memory location addressOf(a)+i. This the code a[-1] is perfectly understandable, it simply refers to the address one before the start of the array.
This may seem crazy, but there are many reasons why this is allowed:
- it is expensive to check whether the index i to a[-] is within bounds of the array.
- some programming techniques actually exploit the fact that
a[-1]is valid. For instance, if I know thatais not actually the start of the array, but a pointer into the middle of the array, thena[-1]simply gets the element of the array that is to the left of the pointer.
To explain how negative indexes work, you first have to learn (or remember) that for any array or pointer a and index i, the expression a[i] is equal to *(a + i).
That means that you can have a pointer to the middle element of an array, and use it with a positive or negative index, and it's simple arithmetic.
Example:
int a[3] = { 1, 2, 3 };
int* p = &a[1]; // Make p point to the second element
std::cout << p[-1]; // Prints the first element of a, equal to *(p - 1)
std::cout << p[ 0]; // Prints the second element of a, equal to *p
std::cout << p[ 1]; // Prints the third element of a, equal to *(p + 1)
Somewhat graphically it can be seen like
+------+------+------+ | a[0] | a[1] | a[2] | +------+------+------+ ^ ^ ^ | | | p-1 p p+1
Now when using a negative index in an array, as in the example of a in the question, it will be something like this:
+-----+-------+-------+------+------+------+ | ... | a[-2] | a[-1] | a[0] | a[1] | a[2] | +-----+-------+-------+------+------+------+
That is, negative indexes here will be out of bounds of the array, and will lead to undefined behavior.
That is something you should never ever do! C++ doesn't check the boundaries of built in plain arrays so technically you can access locations which are out of the allocated space (which is only 3 ints not 5) but you will ultimately produce errors.
The calculation is done at runtime.
Negative indices don't necessarily have to cause a violation, and have their uses.
For example, let's say you have a pointer that is currently pointing to the 10th element in an array. Now, if you need to access the 8th element without changing the pointer, you can do that easily by using a negative index of -2.
char data[] = "01234567890123456789";
char* ptr = &data[9];
char c = ptr[-2]; // 7
Here is an example of use.
An Infinite Impulse Response filter is calculated partially from recent previous output values. Typically, there will be some array of input values and an array where output values are to be placed. If the current output element is yi, then yi may be calculated as yi = a0•xi + a1•xi–1 +a2•yi–1 +a3•yi–2.
A natural way to write code for this is something like:
void IIR(float *x, float *y, size_t n)
{
for (i = 0; i < n; ++i)
y[i] = a0*x[i] + a1*x[i-1] + a2*y[i-1] + a3*y[i-2];
}
Observe that when i is zero, y[i-1] and y[i-2] have negative indices. In this case, the caller is responsible for creating an array, setting the initial two elements to “starter values” for the output (often either zero or values held over from a previous buffer), and passing a pointer to where the first new value is to be written. Thus, this routine, IRR, normally receives a pointer into the middle of an array and uses negative indices to address some elements.
Inko currently allows you to use negative/signed indexes for arrays and byte arrays. When used, indexing starts at the end. For example:
let numbers = Array.new(10, 20, 30) numbers[-1] # => 30 numbers[-2] # => 20
I originally implemented this without much thought: Ruby did it, it was sometimes useful there, so I copied it.
Signed indexes bring some trouble though:
You can only represent indexes up to (263)-1. Now I think most programs won't store that many values in a single collection, but it's something one has to keep in mind.
You need to convert the signed indexes into unsigned indexes. This requires something like
((index % length) + length) % length, which is a fair number of instructions.I have doubts about how useful it really is.
Problem one and two come down to the same: complexity. For example, for Inko this comes in two pieces:
A function that takes a signed index and an unsigned length, and converts the index to an unsigned index. My function does support collections larger that (263)-1 by upcasting the length and index to an i128, but the index size is still limited to (263)-1.
A function to implement the modulo operator (not the remainder). Rust doesn't provide one (it uses remainder), so I had to write my own.
While this is hidden from the user, I do have to deal with it, and like most the less code I have to deal with the better.
Problem 3 is more subjective. The most commonly used negative index is simply -1 to get the last value. This however can be handled by introducing a dedicated last() function. I actually think this is more clear if one isn't used to signed indexes. Besides that, indexes like -2, -3, etc are rarely used in my experience. For example, in the GitLab Rails source code I can only find 37 instances of -1 being used, 11 instances of -2, and only one instance of -3.
With this in mind, I'm starting to think signed indexes aren't really worth the trouble. If you really need to index from the back, you can just do thing[thing.length - N].
What are the thoughts of the subreddit on this matter? Does your language support signed indexes? Or am I not too far off wanting to remove support for this?