Any video I watch, or any article I read, I always see arrays being referred to as "pointers", or the phrase, "an array is a pointer in itself". I know that we can represent arrays in a pointer-like fashion and that &array-name[0] is the address of the 0th element or in essence, the starting address of the array. But can we call an array a pointer? If my memory serves right, whenever we create pointers, the are allocated memory from the heap memory instead of the stack memory and it is exactly opposite for static arrays.
If so, then why do people refer to arrays as pointers many times? I read this answer on StackOverflow and it seemed pretty valid to me yet I do not get it why are the above phrases/jargon used by people?
Could I please get some explanation on this and some clarification on this topic? It would be of much help. This whole thing is bugging me for the last 4-5 days.
Thank you :)
C: differences between char pointer and array - Stack Overflow
C strings pointer vs. arrays - Stack Overflow
In C, are arrays pointers or used as pointers? - Stack Overflow
What is the difference between an array and a pointer in C, and when should I use one over the other? - Stack Overflow
Videos
Here's a hypothetical memory map, showing the results of the two declarations:
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
0x00008000: 'n' 'o' 'w' ' ' 'i' 's' ' ' 't'
0x00008008: 'h' 'e' ' ' 't' 'i' 'm' 'e' '\0'
...
amessage:
0x00500000: 'n' 'o' 'w' ' ' 'i' 's' ' ' 't'
0x00500008: 'h' 'e' ' ' 't' 'i' 'm' 'e' '\0'
pmessage:
0x00500010: 0x00 0x00 0x80 0x00
The string literal "now is the time" is stored as a 16-element array of char at memory address 0x00008000. This memory may not be writable; it's best to assume that it's not. You should never attempt to modify the contents of a string literal.
The declaration
char amessage[] = "now is the time";
allocates a 16-element array of char at memory address 0x00500000 and copies the contents of the string literal to it. This memory is writable; you can change the contents of amessage to your heart's content:
strcpy(amessage, "the time is now");
The declaration
char *pmessage = "now is the time";
allocates a single pointer to char at memory address 0x00500010 and copies the address of the string literal to it.
Since pmessage points to the string literal, it should not be used as an argument to functions that need to modify the string contents:
strcpy(amessage, pmessage); /* OKAY */
strcpy(pmessage, amessage); /* NOT OKAY */
strtok(amessage, " "); /* OKAY */
strtok(pmessage, " "); /* NOT OKAY */
scanf("%15s", amessage); /* OKAY */
scanf("%15s", pmessage); /* NOT OKAY */
and so on. If you changed pmessage to point to amessage:
pmessage = amessage;
then it can be used everywhere amessage can be used.
True, but it's a subtle difference. Essentially, the former:
char amessage[] = "now is the time";
Defines an array whose members live in the current scope's stack space, whereas:
char *pmessage = "now is the time";
Defines a pointer that lives in the current scope's stack space, but that references memory elsewhere (in this one, "now is the time" is stored elsewhere in memory, commonly a string table).
Also, note that because the data belonging to the second definition (the explicit pointer) is not stored in the current scope's stack space, it is unspecified exactly where it will be stored and should not be modified.
As pointed out by Mark, GMan, and Pavel, there is also a difference when the address-of operator is used on either of these variables. For instance, &pmessage returns a pointer of type char**, or a pointer to a pointer to chars, whereas &amessage returns a pointer of type char(*)[16], or a pointer to an array of 16 chars (which, like a char**, needs to be dereferenced twice as litb points out).
You can (in general) use the expression (*ptr)++ to change the value that ptr points to when ptr is a pointer and not an array (ie., if ptr is declared as char* ptr).
However, in your first example:
Copychar *ptr = "Hello!"
ptr is pointing to a literal string, and literal strings are not permitted to be modified (they may actually be stored in memory area which are not writable, such as ROM or memory pages marked as read-only).
In your second example,
Copychar ptr[] = "Hello!";
The array is declared and the initialization actually copies the data in the string literal into the allocated array memory. That array memory is modifiable, so (*ptr)++ works.
Note: for your second declaration, the ptr identifier itself is an array identifier, not a pointer and is not an 'lvalue' so it can't be modified (even though it converts readily to a pointer in most situations). For example, the expression ++ptr would be invalid. I think this is the point that some other answers are trying to make.
When pointing to a string literal, you should not declare the chars to be modifiable, and some compilers will warn you for this:
Copychar *ptr = "Hello!" /* WRONG, missing const! */
The reason is as noted by others that string literals may be stored in an immutable part of the program's memory.
The correct "annotation" for you is to make sure you have a pointer to constant char:
Copyconst char *ptr = "Hello!"
And now you see directly that you can't modify the text stored at the pointer.
Here's the exact language from the C standard (n1256):
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of thesizeofoperator or the unary&operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The important thing to remember here is that there is a difference between an object (in C terms, meaning something that takes up memory) and the expression used to refer to that object.
When you declare an array such as
int a[10];
the object designated by the expression a is an array (i.e., a contiguous block of memory large enough to hold 10 int values), and the type of the expression a is "10-element array of int", or int [10]. If the expression a appears in a context other than as the operand of the sizeof or & operators, then its type is implicitly converted to int *, and its value is the address of the first element.
In the case of the sizeof operator, if the operand is an expression of type T [N], then the result is the number of bytes in the array object, not in a pointer to that object: N * sizeof T.
In the case of the & operator, the value is the address of the array, which is the same as the address of the first element of the array, but the type of the expression is different: given the declaration T a[N];, the type of the expression &a is T (*)[N], or pointer to N-element array of T. The value is the same as a or &a[0] (the address of the array is the same as the address of the first element in the array), but the difference in types matters. For example, given the code
int a[10];
int *p = a;
int (*ap)[10] = &a;
printf("p = %p, ap = %p\n", (void *) p, (void *) ap);
p++;
ap++;
printf("p = %p, ap = %p\n", (void *) p, (void *) ap);
you'll see output on the order of
p = 0xbff11e58, ap = 0xbff11e58
p = 0xbff11e5c, ap = 0xbff11e80
IOW, advancing p adds sizeof int (4) to the original value, whereas advancing ap adds 10 * sizeof int (40).
More standard language:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets[]is a subscripted designation of an element of an array object. The definition of the subscript operator[]is thatE1[E2]is identical to(*((E1)+(E2))). Because of the conversion rules that apply to the binary+operator, ifE1is an array object (equivalently, a pointer to the initial element of an array object) andE2is an integer,E1[E2]designates theE2-th element ofE1(counting from zero).
Thus, when you subscript an array expression, what happens under the hood is that the offset from the address of the first element in the array is computed and the result is dereferenced. The expression
a[i] = 10;
is equivalent to
*((a)+(i)) = 10;
which is equivalent to
*((i)+(a)) = 10;
which is equivalent to
i[a] = 10;
Yes, array subscripting in C is commutative; for the love of God, never do this in production code.
Since array subscripting is defined in terms of pointer operations, you can apply the subscript operator to expressions of pointer type as well as array type:
int *p = malloc(sizeof *p * 10);
int i;
for (i = 0; i < 10; i++)
p[i] = some_initial_value();
Here's a handy table to remember some of these concepts:
Declaration: T a[N];
Expression Type Converts to Value
---------- ---- ------------ -----
a T [N] T * Address of the first element in a;
identical to writing &a[0]
&a T (*)[N] Address of the array; value is the same
as above, but the type is different
sizeof a size_t Number of bytes contained in the array
object (N * sizeof T)
*a T Value at a[0]
a[i] T Value at a[i]
&a[i] T * Address of a[i]
Declaration: T a[N][M];
Expression Type Converts to Value
---------- ---- ------------ -----
a T [N][M] T (*)[M] Address of the first subarray (&a[0])
&a T (*)[N][M] Address of the array (same value as
above, but different type)
sizeof a size_t Number of bytes contained in the
array object (N * M * sizeof T)
*a T [M] T * Value of a[0], which is the address
of the first element of the first subarray
(same as &a[0][0])
a[i] T [M] T * Value of a[i], which is the address
of the first element of the i'th subarray
&a[i] T (*)[M] Address of the i-th subarray; same value as
above, but different type
sizeof a[i] size_t Number of bytes contained in the i'th subarray
object (M * sizeof T)
*a[i] T Value of the first element of the i'th
subarray (a[i][0])
a[i][j] T Value at a[i][j]
&a[i][j] T * Address of a[i][j]
Declaration: T a[N][M][O];
Expression Type Converts to
---------- ---- -----------
a T [N][M][O] T (*)[M][O]
&a T (*)[N][M][O]
*a T [M][O] T (*)[O]
a[i] T [M][O] T (*)[O]
&a[i] T (*)[M][O]
*a[i] T [O] T *
a[i][j] T [O] T *
&a[i][j] T (*)[O]
*a[i][j] T
a[i][j][k] T
From here, the pattern for higher-dimensional arrays should be clear.
So, in summary: arrays are not pointers. In most contexts, array expressions are converted to pointer types.
Arrays are not pointers, though in most expressions an array name evaluates to a pointer to the first element of the array. So it is very, very easy to use an array name as a pointer. You will often see the term 'decay' used to describe this, as in "the array decayed to a pointer".
One exception is as the operand to the sizeof operator, where the result is the size of the array (in bytes, not elements).
A couple additional of issues related to this:
An array parameter to a function is a fiction - the compiler really passes a plain pointer (this doesn't apply to reference-to-array parameters in C++), so you cannot determine the actual size of an array passed to a function - you must pass that information some other way (maybe using an explicit additional parameter, or using a sentinel element - like C strings do)
Also, a common idiom to get the number of elements in an array is to use a macro like:
#define ARRAY_SIZE(arr) ((sizeof(arr))/sizeof(arr[0]))
This has the problem of accepting either an array name, where it will work, or a pointer, where it will give a nonsense result without warning from the compiler. There exist safer versions of the macro (particularly for C++) that will generate a warning or error when it's used with a pointer instead of an array. See the following SO items:
- C++ version
- a better (though still not perfectly safe) C version
Note: C99 VLAs (variable length arrays) might not follow all of these rules (in particular, they can be passed as parameters with the array size known by the called function). I have little experience with VLAs, and as far as I know they're not widely used. However, I do want to point out that the above discussion might apply differently to VLAs.
Arrays are contiguous memory created on the stack. You can't guarantee contiguous stack memory without this syntactic sugar, and even if you could, you'd have to allocate a separate pointer in order to be able to do the pointer arithmetic (unless you wanted to do *(&foo + x), which I'm not sure but it might violate l-value semantics, but is at least quite awkward, and would scream out for some kind of syntactic sugar). Design-wise, it also is a form of encapsulation, since you can refer to the collection with a single identifier (which would otherwise require a separate pointer). And even if you could allocate them contiguously and allocated a separate pointer to reference them, you'd have either
int fooForSomething, fooForSomethingElse...
which forces a fair amount of creativity as your collection grows, so you might think to simplify with
int foo1, foo2 ...,
which looks just like an array but is harder to maintain.
Array notation is convenient, easier to read, and less prone to errors. It provides a formalism over pointers. It might be syntactic sugar, but we all need a little sweetness once in awhile, don't we?
As with all abstractions, you give up a little flexibility for the convenience that the abstraction provides.