Can someone explain how does array of structures work?
How do you make an array of structs in C? - Stack Overflow
c++ - Structure of Arrays vs Array of Structures - Stack Overflow
ARRAY of STRUCTS vs STRUCT of ARRAYS
What is an array of structure in C?
Why do we use array of structure in C programming?
What is the default value in a structure array?
Videos
Good day,
So I created a structure Position with two int members.
struct Position {
int x;
int y;
};
typedef struct Position Position;
int main() {
Position *current = (Position*)malloc(sizeof(Position));
...
}If I want to create an array of it, should I just multiply it to how many elements I want to use?Does pointer arithmetic will move me to the next element struct every time I add 1 to the pointer?Why when I try to reallocate it by multiplying to higher number, it returns invalid pointer even though I just passed the pointer coming from malloc(). Thanks for answering and have a nice day.
Use:
#include<stdio.h>
#define n 3
struct body
{
double p[3]; // Position
double v[3]; // Velocity
double a[3]; // Acceleration
double radius;
double mass;
};
struct body bodies[n];
int main()
{
int a, b;
for(a = 0; a < n; a++)
{
for(b = 0; b < 3; b++)
{
bodies[a].p[b] = 0;
bodies[a].v[b] = 0;
bodies[a].a[b] = 0;
}
bodies[a].mass = 0;
bodies[a].radius = 1.0;
}
return 0;
}
This works fine. Your question was not very clear by the way, so match the layout of your source code with the above.
Another way of initializing an array of structs is to initialize the array members explicitly. This approach is useful and simple if there aren't too many struct and array members.
Use the typedef specifier to avoid re-using the struct statement everytime you declare a struct variable:
typedef struct
{
double p[3];//position
double v[3];//velocity
double a[3];//acceleration
double radius;
double mass;
}Body;
Then declare your array of structs. Initialization of each element goes along with the declaration:
Body bodies[n] = {{{0,0,0}, {0,0,0}, {0,0,0}, 0, 1.0},
{{0,0,0}, {0,0,0}, {0,0,0}, 0, 1.0},
{{0,0,0}, {0,0,0}, {0,0,0}, 0, 1.0}};
To repeat, this is a rather simple and straightforward solution if you don't have too many array elements and large struct members and if you, as you stated, are not interested in a more dynamic approach. This approach can also be useful if the struct members are initialized with named enum-variables (and not just numbers like the example above) whereby it gives the code-reader a better overview of the purpose and function of a structure and its members in certain applications.
Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).
E.g. for RGB pixels versus separate RGB planes:
struct {
uint8_t r, g, b;
} AoS[N];
struct {
uint8_t r[N];
uint8_t g[N];
uint8_t b[N];
} SoA;
If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.
However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.
One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.
For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.
See also this answer for more general discussion of AoS v SoA.
I just want to provide a simple example showing how a Struct of Arrays (SoA) performs better than an Array of Structs (AoS).
In the example, I'm considering three different versions of the same code:
- SoA (v1)
- Straight arrays (v2)
- AoS (v3)
In particular, version 2 considers the use of straight arrays. The timings of versions 2 and 3 are the same for this example and result to be better than version 1. I suspect that, in general, straight arrays could be preferable, although at the expense of readability, since, for example, loading from uniform cache could be enabled through const __restrict__ for this case.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <thrust\device_vector.h>
#include "Utilities.cuh"
#include "TimingGPU.cuh"
#define BLOCKSIZE 1024
/******************************************/
/* CELL STRUCT LEADING TO ARRAY OF STRUCT */
/******************************************/
struct cellAoS {
unsigned int x1;
unsigned int x2;
unsigned int code;
bool done;
};
/*******************************************/
/* CELL STRUCT LEADING TO STRUCT OF ARRAYS */
/*******************************************/
struct cellSoA {
unsigned int *x1;
unsigned int *x2;
unsigned int *code;
bool *done;
};
/*******************************************/
/* KERNEL MANIPULATING THE ARRAY OF STRUCT */
/*******************************************/
__global__ void AoSvsSoA_v1(cellAoS *d_cells, const int N) {
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < N) {
cellAoS tempCell = d_cells[tid];
tempCell.x1 = tempCell.x1 + 10;
tempCell.x2 = tempCell.x2 + 10;
d_cells[tid] = tempCell;
}
}
/******************************/
/* KERNEL MANIPULATING ARRAYS */
/******************************/
__global__ void AoSvsSoA_v2(unsigned int * __restrict__ d_x1, unsigned int * __restrict__ d_x2, const int N) {
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < N) {
d_x1[tid] = d_x1[tid] + 10;
d_x2[tid] = d_x2[tid] + 10;
}
}
/********************************************/
/* KERNEL MANIPULATING THE STRUCT OF ARRAYS */
/********************************************/
__global__ void AoSvsSoA_v3(cellSoA cell, const int N) {
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < N) {
cell.x1[tid] = cell.x1[tid] + 10;
cell.x2[tid] = cell.x2[tid] + 10;
}
}
/********/
/* MAIN */
/********/
int main() {
const int N = 2048 * 2048 * 4;
TimingGPU timerGPU;
thrust::host_vector<cellAoS> h_cells(N);
thrust::device_vector<cellAoS> d_cells(N);
thrust::host_vector<unsigned int> h_x1(N);
thrust::host_vector<unsigned int> h_x2(N);
thrust::device_vector<unsigned int> d_x1(N);
thrust::device_vector<unsigned int> d_x2(N);
for (int k = 0; k < N; k++) {
h_cells[k].x1 = k + 1;
h_cells[k].x2 = k + 2;
h_cells[k].code = k + 3;
h_cells[k].done = true;
h_x1[k] = k + 1;
h_x2[k] = k + 2;
}
d_cells = h_cells;
d_x1 = h_x1;
d_x2 = h_x2;
cellSoA cell;
cell.x1 = thrust::raw_pointer_cast(d_x1.data());
cell.x2 = thrust::raw_pointer_cast(d_x2.data());
cell.code = NULL;
cell.done = NULL;
timerGPU.StartCounter();
AoSvsSoA_v1 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(thrust::raw_pointer_cast(d_cells.data()), N);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
printf("Timing AoSvsSoA_v1 = %f\n", timerGPU.GetCounter());
//timerGPU.StartCounter();
//AoSvsSoA_v2 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(thrust::raw_pointer_cast(d_x1.data()), thrust::raw_pointer_cast(d_x2.data()), N);
//gpuErrchk(cudaPeekAtLastError());
//gpuErrchk(cudaDeviceSynchronize());
//printf("Timing AoSvsSoA_v2 = %f\n", timerGPU.GetCounter());
timerGPU.StartCounter();
AoSvsSoA_v3 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(cell, N);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
printf("Timing AoSvsSoA_v3 = %f\n", timerGPU.GetCounter());
h_cells = d_cells;
h_x1 = d_x1;
h_x2 = d_x2;
// --- Check results
for (int k = 0; k < N; k++) {
if (h_x1[k] != k + 11) {
printf("h_x1[%i] not equal to %i\n", h_x1[k], k + 11);
break;
}
if (h_x2[k] != k + 12) {
printf("h_x2[%i] not equal to %i\n", h_x2[k], k + 12);
break;
}
if (h_cells[k].x1 != k + 11) {
printf("h_cells[%i].x1 not equal to %i\n", h_cells[k].x1, k + 11);
break;
}
if (h_cells[k].x2 != k + 12) {
printf("h_cells[%i].x2 not equal to %i\n", h_cells[k].x2, k + 12);
break;
}
}
}
The following are the timings (runs performed on a GTX960):
Array of struct 9.1ms (v1 kernel)
Struct of arrays 3.3ms (v3 kernel)
Straight arrays 3.2ms (v2 kernel)