I am currently working on an assignment in C where we are required to make a stack by simply using pointers.
I know the line: int *ptr = &val; declares ptr to be a "pointer to"(which is my interpretation of what asterisk * means in C) the "address of" the integer variable val.
When I want to create a double pointer, or a pointer to a pointer, I do so like:
int **ptr_ptr = &ptr; By setting ptr_ptr to a "pointer to" the address of pointer ptr.
When we use the asterisk anywhere other than in a declaration, it is usually referred to as dereferencing that pointer (I think), and grabbing the value that the pointer actually points to. This goes against my intuition that an asterisk means "pointer to".
Could anybody explain the proper meaning of the asterisk in C? Is it just that it means different things depending on how it is used (i.e. in a declaration versus anywhere else)?
Thanks!
You have pointers and values:
int* p; // variable p is pointer to integer type
int i; // integer value
You turn a pointer into a value with *:
int i2 = *p; // integer i2 is assigned with integer value that pointer p is pointing to
You turn a value into a pointer with &:
int* p2 = &i; // pointer p2 will point to the integer i
Edit:
In the case of arrays, they are treated very much like pointers. If you think of them as pointers, you'll be using * to get at the values inside of them as explained above, but there is also another, more common way using the [] operator:
int a[2]; // array of integers
int i = *a; // the value of the first element of a
int i2 = a[0]; // another way to get the first element
To get the second element:
int a[2]; // array
int i = *(a + 1); // the value of the second element
int i2 = a[1]; // the value of the second element
So the [] indexing operator is a special form of the * operator, and it works like this:
a[i] == *(a + i); // these two statements are the same thing
There is a pattern when dealing with arrays and functions; it's just a little hard to see at first.
When dealing with arrays, it's useful to remember the following: when an array expression appears in most contexts, the type of the expression is implicitly converted from "N-element array of T" to "pointer to T", and its value is set to point to the first element in the array. The exceptions to this rule are when the array expression appears as an operand of either the & or sizeof operators, or when it is a string literal being used as an initializer in a declaration.
Thus, when you call a function with an array expression as an argument, the function will receive a pointer, not an array:
int arr[10];
...
foo(arr);
...
void foo(int *arr) { ... }
This is why you don't use the & operator for arguments corresponding to "%s" in scanf():
char str[STRING_LENGTH];
...
scanf("%s", str);
Because of the implicit conversion, scanf() receives a char * value that points to the beginning of the str array. This holds true for any function called with an array expression as an argument (just about any of the str* functions, *scanf and *printf functions, etc.).
In practice, you will probably never call a function with an array expression using the & operator, as in:
int arr[N];
...
foo(&arr);
void foo(int (*p)[N]) {...}
Such code is not very common; you have to know the size of the array in the function declaration, and the function only works with pointers to arrays of specific sizes (a pointer to a 10-element array of T is a different type than a pointer to a 11-element array of T).
When an array expression appears as an operand to the & operator, the type of the resulting expression is "pointer to N-element array of T", or T (*)[N], which is different from an array of pointers (T *[N]) and a pointer to the base type (T *).
When dealing with functions and pointers, the rule to remember is: if you want to change the value of an argument and have it reflected in the calling code, you must pass a pointer to the thing you want to modify. Again, arrays throw a bit of a monkey wrench into the works, but we'll deal with the normal cases first.
Remember that C passes all function arguments by value; the formal parameter receives a copy of the value in the actual parameter, and any changes to the formal parameter are not reflected in the actual parameter. The common example is a swap function:
void swap(int x, int y) { int tmp = x; x = y; y = tmp; }
...
int a = 1, b = 2;
printf("before swap: a = %d, b = %d\n", a, b);
swap(a, b);
printf("after swap: a = %d, b = %d\n", a, b);
You'll get the following output:
before swap: a = 1, b = 2 after swap: a = 1, b = 2
The formal parameters x and y are distinct objects from a and b, so changes to x and y are not reflected in a and b. Since we want to modify the values of a and b, we must pass pointers to them to the swap function:
void swap(int *x, int *y) {int tmp = *x; *x = *y; *y = tmp; }
...
int a = 1, b = 2;
printf("before swap: a = %d, b = %d\n", a, b);
swap(&a, &b);
printf("after swap: a = %d, b = %d\n", a, b);
Now your output will be
before swap: a = 1, b = 2 after swap: a = 2, b = 1
Note that, in the swap function, we don't change the values of x and y, but the values of what x and y point to. Writing to *x is different from writing to x; we're not updating the value in x itself, we get a location from x and update the value in that location.
This is equally true if we want to modify a pointer value; if we write
int myFopen(FILE *stream) {stream = fopen("myfile.dat", "r"); }
...
FILE *in;
myFopen(in);
then we're modifying the value of the input parameter stream, not what stream points to, so changing stream has no effect on the value of in; in order for this to work, we must pass in a pointer to the pointer:
int myFopen(FILE **stream) {*stream = fopen("myFile.dat", "r"); }
...
FILE *in;
myFopen(&in);
Again, arrays throw a bit of a monkey wrench into the works. When you pass an array expression to a function, what the function receives is a pointer. Because of how array subscripting is defined, you can use a subscript operator on a pointer the same way you can use it on an array:
int arr[N];
init(arr, N);
...
void init(int *arr, int N) {size_t i; for (i = 0; i < N; i++) arr[i] = i*i;}
Note that array objects may not be assigned; i.e., you can't do something like
int a[10], b[10];
...
a = b;
so you want to be careful when you're dealing with pointers to arrays; something like
void (int (*foo)[N])
{
...
*foo = ...;
}
won't work.
Videos
Why does C use the asterisk for pointers?
Simply - because B did.
Because memory is a linear array, it is possible to interpret the value in a cell as an index in this array, and BCPL supplies an operator for this purpose. In the original language it was spelled
rv, and later!, while B uses the unary*. Thus, ifpis a cell containing the index of (or address of), or pointer to) another cell,*prefers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.
From The Development of the C Language
Thats it. At this point, the question is as uninteresting as "why does python 3 use . to call a method? Why not ->?" Well... because Python 2 uses . to call a method.
Rarely does a language exist from nothing. It has influences and is based on something that came before.
So, why didn't B use ! for derefrencing a pointer like its predecessor BCPL did?
Well, BCPL was a bit wordy. Instead of && or || BCPL used logand and logor. This was because most keyboards din't have โง or โจ keys and not equal was actually the word NEQV (see The BCPL Reference Manual).
B appears to have been partially inspired to tighten up the syntax rather than have long words for all these logical operators that programmers did fairly frequently. And thus ! for dereference became * so that ! could be used for logical negation. Note there's a difference between the unary * operator and the binary * operator (multiplication).
Well, what about other options, like
->?
The -> was taken for syntactic sugar around field derefrences struct_pointer->field which is (*struct_pointer).field
Other options like <- could create ambiguous parsings. For example:
foo <- bar
Is that to be read as:
(foo) <- (bar)
or
(foo) < (-bar)
Making a unary operator that is composed of a binary operator and another unary operator is quite likely to have problems as the second unary operator may be a prefix for another expression.
Furthermore, it is again important to try to keep the things being typed frequently to a minimum. I would hate to have to write:
int main(int argc, char->-> argv, char->-> envp)
This also becomes difficult to read.
Other characters might have been possible (the @ wasn't used until Objective C appropriated it). Though again, this goes to the core of 'C uses * because B did'. Why didn't B use @? Well, B didn't use all the characters. There was no bpp program (compare cpp) and other characters were available in B (such as # which was later used by cpp).
If I may hazard a guess as to why - its because of where the keys are. From a manual on B:
To facilitate manipulation of addresses when it seems advisable, B provides two unary address operators,
*and&.&is the address operator so&xis the address ofx, assuming it has one.*is the indirection operator;*xmeans "use the content of x as an address."
Note that & is shift-7 and * is shift-8. Their proximity to each other may have been a hint to the programmer as to what they do... but that's only a guess. One would have to ask Ken Thompson about why that choice was made.
So, there you have it. C is that way because B was. B is that way because it wanted to change from how BCPL was.
I was asked by a student if & and * were chosen because they were next to each other on the keyboard (something I had never noticed before). Much googling led me to B and BCPL documentation, and this thread. However, I couldn't find much at all. It seemed like there were lots of reasons for * in B, but I couldn't find anything for &.
So following @MichaelT's suggestion, I asked Ken Thompson:
From: Ken Thompson < [email protected] >
near on the keyboard: no.
c copied from b so & and * are same there.
b got * from earlier languages - some assembly,
bcpl and i think pl/1.
i think that i used & because the name (ampersand)
sounds like "address." b was designed to be run with
a teletype model 33 teletype. (5 bit baud-o code)
so the use of symbols was restricted.
They are EXACTLY equivalent.
However, in:
int *myVariable, myVariable2;
It seems obvious that myVariable has type int*, while myVariable2 has type int.
In
int* myVariable, myVariable2;
it may seem implied that both are of type int*, but that is not correct as myVariable2 has type int.
Therefore, the first programming style is more intuitive.
If you look at it another way, *myVariable is of type int, which makes some sense.