Videos
R.. is correct in his answer to the second part of your question: this code is not advised when using a modern C compiler.
But to answer the first part of your question, what this is actually doing is:
(
(int)( // 4.
&( ( // 3.
(a*)(0) // 1.
)->b ) // 2.
)
)
Working from the inside out, this is ...
- Casting the value zero to the struct pointer type
a* - Getting the struct field b of this (illegally placed) struct object
- Getting the address of this
bfield - Casting the address to an
int
Conceptually this is placing a struct object at memory address zero and then finding out at what the address of a particular field is. This could allow you to figure out the offsets in memory of each field in a struct so you could write your own serializers and deserializers to convert structs to and from byte arrays.
Of course if you would actually dereference a zero pointer your program would crash, but actually everything happens in the compiler and no actual zero pointer is dereferenced at runtime.
In most of the original systems that C ran on the size of an int was 32 bits and was the same as a pointer, so this actually worked.
It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).
The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:
#include <stddef.h>
struct foo {
int a;
int b;
char *c;
};
struct struct_desc {
const char *name;
int type;
size_t off;
};
static const struct struct_desc foo_desc[] = {
{ "a", INT, offsetof(struct foo, a) },
{ "b", INT, offsetof(struct foo, b) },
{ "c", CHARPTR, offsetof(struct foo, c) },
};
which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.
At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used on an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.
The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer
SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);
The second line does not actually cause a dereference (implementation dependent). It simply returns the address of SomeIntMember within the pSomeType value.
What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.
In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.
The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.
It's a builtin provided by the GCC compiler to implement the offsetof macro that is specified by the C and C++ Standard:
GCC - offsetof
It returns the offset in bytes that a member of a POD struct/union is at.
Sample:
struct abc1 { int a, b, c; };
union abc2 { int a, b, c; };
struct abc3 { abc3() { } int a, b, c; }; // non-POD
union abc4 { abc4() { } int a, b, c; }; // non-POD
assert(offsetof(abc1, a) == 0); // always, because there's no padding before a.
assert(offsetof(abc1, b) == 4); // here, on my system
assert(offsetof(abc2, a) == offsetof(abc2, b)); // (members overlap)
assert(offsetof(abc3, c) == 8); // undefined behavior. GCC outputs warnings
assert(offsetof(abc4, a) == 0); // undefined behavior. GCC outputs warnings
@Jonathan provides a nice example of where you can use it. I remember having seen it used to implement intrusive lists (lists whose data items include next and prev pointers itself), but I can't remember where it was helpful in implementing it, sadly.
As @litb points out and @JesperE shows, offsetof() provides an integer offset in bytes (as a size_t value).
When might you use it?
One case where it might be relevant is a table-driven operation for reading an enormous number of diverse configuration parameters from a file and stuffing the values into an equally enormous data structure. Reducing enormous down to SO trivial (and ignoring a wide variety of necessary real-world practices, such as defining structure types in headers), I mean that some parameters could be integers and others strings, and the code might look faintly like:
#include <stddef.h>
typedef stuct config_info config_info;
struct config_info
{
int parameter1;
int parameter2;
int parameter3;
char *string1;
char *string2;
char *string3;
int parameter4;
} main_configuration;
typedef struct config_desc config_desc;
static const struct config_desc
{
char *name;
enum paramtype { PT_INT, PT_STR } type;
size_t offset;
int min_val;
int max_val;
int max_len;
} desc_configuration[] =
{
{ "GIZMOTRON_RATING", PT_INT, offsetof(config_info, parameter1), 0, 100, 0 },
{ "NECROSIS_FACTOR", PT_INT, offsetof(config_info, parameter2), -20, +20, 0 },
{ "GILLYWEED_LEAVES", PT_INT, offsetof(config_info, parameter3), 1, 3, 0 },
{ "INFLATION_FACTOR", PT_INT, offsetof(config_info, parameter4), 1000, 10000, 0 },
{ "EXTRA_CONFIG", PT_STR, offsetof(config_info, string1), 0, 0, 64 },
{ "USER_NAME", PT_STR, offsetof(config_info, string2), 0, 0, 16 },
{ "GIZMOTRON_LABEL", PT_STR, offsetof(config_info, string3), 0, 0, 32 },
};
You can now write a general function that reads lines from the config file, discarding comments and blank lines. It then isolates the parameter name, and looks that up in the desc_configuration table (which you might sort so that you can do a binary search - multiple SO questions address that). When it finds the correct config_desc record, it can pass the value it found and the config_desc entry to one of two routines - one for processing strings, the other for processing integers.
The key part of those functions is:
static int validate_set_int_config(const config_desc *desc, char *value)
{
int *data = (int *)((char *)&main_configuration + desc->offset);
...
*data = atoi(value);
...
}
static int validate_set_str_config(const config_desc *desc, char *value)
{
char **data = (char **)((char *)&main_configuration + desc->offset);
...
*data = strdup(value);
...
}
This avoids having to write a separate function for each separate member of the structure.
One of the ways I've used it in embedded systems is where I have a struct which represents the layout of non-volatile memory (e.g. EEPROM), but where I don't want to actually create an instance of this struct in RAM. You can use various nice macro tricks to allow you to read and write specific fields from the EEPROM, where offsetof does the work of calculating the address of a field within the struct.
With regard to 'evil', you have to remember that lots of stuff which was traditionally done in 'C' programming, particularly on resource-limited platforms, now looks like evil hackery when viewed from the luxurious surrounding of modern computing.
One legitimate use of offsetof() is to determine the alignment of a type:
#define ALIGNMENT_OF( t ) offsetof( struct { char x; t test; }, test )
It may be a bit low-level to need the alignment of an object, but in any case I'd consider this a legitimate use.