Structure Alignment? CUDA Structure Alignment differs?
Structures in C: From Basics to Memory Alignment
How does data structure alignment help in efficient reading of memory?
c - Structure padding and packing - Stack Overflow
Videos
At least on most machines, a type is only ever aligned to a boundary as large as the type itself [Edit: you can't really demand any "more" alignment than that, because you have to be able to create arrays, and you can't insert padding into an array]. On your implementation, short is apparently 2 bytes, and int 4 bytes.
That means your first struct is aligned to a 2-byte boundary. Since all the members are 2 bytes apiece, no padding is inserted between them.
The second contains a 4-byte item, which gets aligned to a 4-byte boundary. Since it's preceded by 6 bytes, 2 bytes of padding is inserted between v3 and i, giving 6 bytes of data in the shorts, two bytes of padding, and 4 more bytes of data in the int for a total of 12.
Forget about having different members, even if you write two structs whose members are exactly same, with a difference is that the order in which they're declared is different, then size of each struct can be (and often is) different.
For example, see this,
#include <iostream>
using namespace std;
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i; //note the order is different!
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs has same members!
Code at Ideone : http://ideone.com/HGGVl
The bottomline is that the Standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.
I know that data structure alignment is done to prevent a data structure being stored across two memory blocks. This was explained by u/AuntieSauce in another post I created regarding the internal working of structs.
Now, why does the second rule ensure we don’t have one element of a struct stretched across two blocks? Here is an example: lets say we have a struct that an 8 byte element and a 1 byte element:
| 8 bytes | 1 byte
Now, lets say we want to make an array of structs. Each struct is 8 bytes in size, and arrays are stored contiguously in memory, so we have
8+1 bytes | 8+1 bytes | 8+1 bytes | etc…
If our array is large enough we will eventually approach the end of a block. Will a block size ever be divisible by 9? Like, almost definitely not. So you’ll get the end of the block with less than 9 bytes of space left, and the struct will get chopped!
In the Wikipedia article about data structure alignment, it is written that CPU can efficiently read memory when it is data structure aligned.
The CPU in modern computer hardware performs reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data's memory address is a multiple of the data size
So my question is that how does data structure alignment actually help in the efficient memory reads by the CPU.
Thanks!
Padding aligns structure members to "natural" address boundaries - say, int members would have offsets, which are mod(4) == 0 on 32-bit platform. Padding is on by default. It inserts the following "gaps" into your first structure:
struct mystruct_A {
char a;
char gap_0[3]; /* inserted by compiler: for alignment of b */
int b;
char c;
char gap_1[3]; /* -"-: for alignment of the whole struct in an array */
} x;
Packing, on the other hand prevents compiler from doing padding - this has to be explicitly requested - under GCC it's __attribute__((__packed__)), so the following:
struct __attribute__((__packed__)) mystruct_A {
char a;
int b;
char c;
};
would produce structure of size 6 on a 32-bit architecture.
A note though - unaligned memory access is slower on architectures that allow it (like x86 and amd64), and is explicitly prohibited on strict alignment architectures like SPARC.
(The above answers explained the reason quite clearly, but seems not totally clear about the size of padding, so, I will add an answer according to what I learned from The Lost Art of Structure Packing, it has evolved to not limit to C, but also applicable to Go, Rust.)
Memory align (for struct)
Rules:
- Before each individual member, there will be padding so that to make it start at an address that is divisible by its alignment requirement.
E.g., on many systems, anintshould start at an address divisible by 4 and ashortby 2. charandchar[]are special, could be any memory address, so they don't need padding before them.- For
struct, other than the alignment need for each individual member, the size of whole struct itself will be aligned to a size divisible by strictest alignment requirement of any of its members, by padding at end.
E.g., on many systems, if struct's largest member isintthen by divisible by 4, ifshortthen by 2.
Order of member:
- The order of member might affect actual size of struct, so take that in mind.
E.g., the
stu_candstu_dfrom example below have the same members, but in different order, and result in different size for the 2 structs.
Address in memory (for struct)
Empty space:
- Empty space between 2 structs could be used by non-struct variables that could fit in.
e.g intest_struct_address()below, the variablexresides between adjacent structgandh.
No matter whetherxis declared,h's address won't change,xjust reused the empty space thatgwasted.
Similar case fory.
Example
(for 64 bit system)
memory_align.c:
/**
* Memory align & padding - for struct.
* compile: gcc memory_align.c
* execute: ./a.out
*/
#include <stdio.h>
// size is 8, 4 + 1, then round to multiple of 4 (int's size),
struct stu_a {
int i;
char c;
};
// size is 16, 8 + 1, then round to multiple of 8 (long's size),
struct stu_b {
long l;
char c;
};
// size is 24, l need padding by 4 before it, then round to multiple of 8 (long's size),
struct stu_c {
int i;
long l;
char c;
};
// size is 16, 8 + 4 + 1, then round to multiple of 8 (long's size),
struct stu_d {
long l;
int i;
char c;
};
// size is 16, 8 + 4 + 1, then round to multiple of 8 (double's size),
struct stu_e {
double d;
int i;
char c;
};
// size is 24, d need align to 8, then round to multiple of 8 (double's size),
struct stu_f {
int i;
double d;
char c;
};
// size is 4,
struct stu_g {
int i;
};
// size is 8,
struct stu_h {
long l;
};
// test - padding within a single struct,
int test_struct_padding() {
printf("%s: %ld\n", "stu_a", sizeof(struct stu_a));
printf("%s: %ld\n", "stu_b", sizeof(struct stu_b));
printf("%s: %ld\n", "stu_c", sizeof(struct stu_c));
printf("%s: %ld\n", "stu_d", sizeof(struct stu_d));
printf("%s: %ld\n", "stu_e", sizeof(struct stu_e));
printf("%s: %ld\n", "stu_f", sizeof(struct stu_f));
printf("%s: %ld\n", "stu_g", sizeof(struct stu_g));
printf("%s: %ld\n", "stu_h", sizeof(struct stu_h));
return 0;
}
// test - address of struct,
int test_struct_address() {
printf("%s: %ld\n", "stu_g", sizeof(struct stu_g));
printf("%s: %ld\n", "stu_h", sizeof(struct stu_h));
printf("%s: %ld\n", "stu_f", sizeof(struct stu_f));
struct stu_g g;
struct stu_h h;
struct stu_f f1;
struct stu_f f2;
int x = 1;
long y = 1;
printf("address of %s: %p\n", "g", &g);
printf("address of %s: %p\n", "h", &h);
printf("address of %s: %p\n", "f1", &f1);
printf("address of %s: %p\n", "f2", &f2);
printf("address of %s: %p\n", "x", &x);
printf("address of %s: %p\n", "y", &y);
// g is only 4 bytes itself, but distance to next struct is 16 bytes(on 64 bit system) or 8 bytes(on 32 bit system),
printf("space between %s and %s: %ld\n", "g", "h", (long)(&h) - (long)(&g));
// h is only 8 bytes itself, but distance to next struct is 16 bytes(on 64 bit system) or 8 bytes(on 32 bit system),
printf("space between %s and %s: %ld\n", "h", "f1", (long)(&f1) - (long)(&h));
// f1 is only 24 bytes itself, but distance to next struct is 32 bytes(on 64 bit system) or 24 bytes(on 32 bit system),
printf("space between %s and %s: %ld\n", "f1", "f2", (long)(&f2) - (long)(&f1));
// x is not a struct, and it reuse those empty space between struts, which exists due to padding, e.g between g & h,
printf("space between %s and %s: %ld\n", "x", "f2", (long)(&x) - (long)(&f2));
printf("space between %s and %s: %ld\n", "g", "x", (long)(&x) - (long)(&g));
// y is not a struct, and it reuse those empty space between struts, which exists due to padding, e.g between h & f1,
printf("space between %s and %s: %ld\n", "x", "y", (long)(&y) - (long)(&x));
printf("space between %s and %s: %ld\n", "h", "y", (long)(&y) - (long)(&h));
return 0;
}
int main(int argc, char * argv[]) {
test_struct_padding();
// test_struct_address();
return 0;
}
Execution result - test_struct_padding():
stu_a: 8
stu_b: 16
stu_c: 24
stu_d: 16
stu_e: 16
stu_f: 24
stu_g: 4
stu_h: 8
Execution result - test_struct_address():
stu_g: 4
stu_h: 8
stu_f: 24
address of g: 0x7fffd63a95d0 // struct variable - address dividable by 16,
address of h: 0x7fffd63a95e0 // struct variable - address dividable by 16,
address of f1: 0x7fffd63a95f0 // struct variable - address dividable by 16,
address of f2: 0x7fffd63a9610 // struct variable - address dividable by 16,
address of x: 0x7fffd63a95dc // non-struct variable - resides within the empty space between struct variable g & h.
address of y: 0x7fffd63a95e8 // non-struct variable - resides within the empty space between struct variable h & f1.
space between g and h: 16
space between h and f1: 16
space between f1 and f2: 32
space between x and f2: -52
space between g and x: 12
space between x and y: 12
space between h and y: 8
Thus address start for each variable is g:d0 x:dc h:e0 y:e8

I have a struct that inherits from a 16 byte aligned struct:
struct __declspec( align(16) ) SLIST_ENTRY
{
struct SLIST_ENTRY* Next;
};
struct CListEntry : SLIST_ENTRY { void* p; };
C_ASSERT( sizeof ( CListEntry ) == sizeof( void* ) * 4 );
C_ASSERT( offsetof( CListEntry, p ) == sizeof( void* ) * 2 );I don't understand why both assertions succeed, p could be placed immediately after Next and CListEntry would still have 16 byte alignment. Even weirder is when I compile the code:
CListEntry* pListEntry; // mov rax, qword ptr [pListEntry] void* p; // mov rcx, qword ptr [p] pListEntry->p = p; // mov qword ptr [rax+8], rcx
And in the debugger window I see ( (char*)pListEntry + 8 ) being updated, not ( (char*)pListEntry + 16 ) as the assertions suggest. What's going on?