Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

See also this answer for more general discussion of AoS v SoA.

Answer from Paul R on Stack Overflow

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

See also this answer for more general discussion of AoS v SoA.

Answer from Paul R on Stack Overflow
Top answer
1 of 3
79

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

See also this answer for more general discussion of AoS v SoA.

2 of 3
14

I just want to provide a simple example showing how a Struct of Arrays (SoA) performs better than an Array of Structs (AoS).

In the example, I'm considering three different versions of the same code:

  1. SoA (v1)
  2. Straight arrays (v2)
  3. AoS (v3)

In particular, version 2 considers the use of straight arrays. The timings of versions 2 and 3 are the same for this example and result to be better than version 1. I suspect that, in general, straight arrays could be preferable, although at the expense of readability, since, for example, loading from uniform cache could be enabled through const __restrict__ for this case.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

#include <thrust\device_vector.h>

#include "Utilities.cuh"
#include "TimingGPU.cuh"

#define BLOCKSIZE   1024

/******************************************/
/* CELL STRUCT LEADING TO ARRAY OF STRUCT */
/******************************************/
struct cellAoS {

    unsigned int    x1;
    unsigned int    x2;
    unsigned int    code;
    bool            done;

};

/*******************************************/
/* CELL STRUCT LEADING TO STRUCT OF ARRAYS */
/*******************************************/
struct cellSoA {

    unsigned int    *x1;
    unsigned int    *x2;
    unsigned int    *code;
    bool            *done;

};


/*******************************************/
/* KERNEL MANIPULATING THE ARRAY OF STRUCT */
/*******************************************/
__global__ void AoSvsSoA_v1(cellAoS *d_cells, const int N) {

    const int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < N) {
        cellAoS tempCell = d_cells[tid];

        tempCell.x1 = tempCell.x1 + 10;
        tempCell.x2 = tempCell.x2 + 10;

        d_cells[tid] = tempCell;
    }

}

/******************************/
/* KERNEL MANIPULATING ARRAYS */
/******************************/
__global__ void AoSvsSoA_v2(unsigned int * __restrict__ d_x1, unsigned int * __restrict__ d_x2, const int N) {

    const int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < N) {

        d_x1[tid] = d_x1[tid] + 10;
        d_x2[tid] = d_x2[tid] + 10;

    }

}

/********************************************/
/* KERNEL MANIPULATING THE STRUCT OF ARRAYS */
/********************************************/
__global__ void AoSvsSoA_v3(cellSoA cell, const int N) {

    const int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < N) {

        cell.x1[tid] = cell.x1[tid] + 10;
        cell.x2[tid] = cell.x2[tid] + 10;

    }

}

/********/
/* MAIN */
/********/
int main() {

    const int N = 2048 * 2048 * 4;

    TimingGPU timerGPU;

    thrust::host_vector<cellAoS>    h_cells(N);
    thrust::device_vector<cellAoS>  d_cells(N);

    thrust::host_vector<unsigned int>   h_x1(N);
    thrust::host_vector<unsigned int>   h_x2(N);

    thrust::device_vector<unsigned int> d_x1(N);
    thrust::device_vector<unsigned int> d_x2(N);

    for (int k = 0; k < N; k++) {

        h_cells[k].x1 = k + 1;
        h_cells[k].x2 = k + 2;
        h_cells[k].code = k + 3;
        h_cells[k].done = true;

        h_x1[k] = k + 1;
        h_x2[k] = k + 2;

    }

    d_cells = h_cells;

    d_x1 = h_x1;
    d_x2 = h_x2;

    cellSoA cell;
    cell.x1 = thrust::raw_pointer_cast(d_x1.data());
    cell.x2 = thrust::raw_pointer_cast(d_x2.data());
    cell.code = NULL;
    cell.done = NULL;

    timerGPU.StartCounter();
    AoSvsSoA_v1 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(thrust::raw_pointer_cast(d_cells.data()), N);
    gpuErrchk(cudaPeekAtLastError());
    gpuErrchk(cudaDeviceSynchronize());
    printf("Timing AoSvsSoA_v1 = %f\n", timerGPU.GetCounter());

    //timerGPU.StartCounter();
    //AoSvsSoA_v2 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(thrust::raw_pointer_cast(d_x1.data()), thrust::raw_pointer_cast(d_x2.data()), N);
    //gpuErrchk(cudaPeekAtLastError());
    //gpuErrchk(cudaDeviceSynchronize());
    //printf("Timing AoSvsSoA_v2 = %f\n", timerGPU.GetCounter());

    timerGPU.StartCounter();
    AoSvsSoA_v3 << <iDivUp(N, BLOCKSIZE), BLOCKSIZE >> >(cell, N);
    gpuErrchk(cudaPeekAtLastError());
    gpuErrchk(cudaDeviceSynchronize());
    printf("Timing AoSvsSoA_v3 = %f\n", timerGPU.GetCounter());

    h_cells = d_cells;

    h_x1 = d_x1;
    h_x2 = d_x2;

    // --- Check results
    for (int k = 0; k < N; k++) {
        if (h_x1[k] != k + 11) {
            printf("h_x1[%i] not equal to %i\n", h_x1[k], k + 11);
            break;
        }
        if (h_x2[k] != k + 12) {
            printf("h_x2[%i] not equal to %i\n", h_x2[k], k + 12);
            break;
        }
        if (h_cells[k].x1 != k + 11) {
            printf("h_cells[%i].x1 not equal to %i\n", h_cells[k].x1, k + 11);
            break;
        }
        if (h_cells[k].x2 != k + 12) {
            printf("h_cells[%i].x2 not equal to %i\n", h_cells[k].x2, k + 12);
            break;
        }
    }

}

The following are the timings (runs performed on a GTX960):

Array of struct        9.1ms (v1 kernel)
Struct of arrays       3.3ms (v3 kernel)
Straight arrays        3.2ms (v2 kernel)
🌐
Wikipedia
en.wikipedia.org › wiki › AoS_and_SoA
AoS and SoA - Wikipedia
November 3, 2025 - In computing, an array of structures (AoS), structure of arrays (SoA) or array of structures of arrays (AoSoA) are contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD and SIMT programming. Structure of arrays (SoA) is a layout ...
Discussions

Array of Structs vs Structs of Arrays
I am developing numerical simulator and need to store variables (temporarily) for each iterations. Those variables change in each iterations. But I am not sure what kind of storing method I should use to have faster processing. (If I use Array of Structs, Sturcts should be mutable type of structs) ... More on discourse.julialang.org
🌐 discourse.julialang.org
0
0
November 4, 2024
Structures of Arrays vs Arrays of Structures?
Structures of arrays are preferred to arrays of structures because the data is arranged such that each thread (in a half warp) can access data in the same area of memory at once, otherwise known as coalescing, as we trying to avoid having two or more blocks in your grid trying to access the ... More on forums.developer.nvidia.com
🌐 forums.developer.nvidia.com
0
0
December 2, 2009
Whats best? Array of Structs OR Struct of Arrays - Unity Engine - Unity Discussions
Hi Generally does it matter for the performance if I use Array of Structs OR Struct of Arrays ? /Thomas More on discussions.unity.com
🌐 discussions.unity.com
0
June 15, 2015
Struct of Arrays (SoA) vs Array of Structs (AoS)
There are two ways to arrange data in memory. SoA means Struct of Arrays and AoS means Array of Structs. I’m trying to understand when one is better than the other in Julia. I was reading @ChrisRackauckas fantastic lectures (The Different Flavors of Parallelism), where he has an example of this. More on discourse.julialang.org
🌐 discourse.julialang.org
0
4
October 17, 2019
🌐
Reddit
reddit.com › r/bigquery › array of structs vs struct of arrays
r/bigquery on Reddit: ARRAY of STRUCTS vs STRUCT of ARRAYS
September 8, 2024 -

Hi,

So I'm trying to learn the concept of STRUCTS, ARRAYS and how to use them.

I asked AI to create two sample tables: one using ARRAY of STRUCTS and another using STRUCT of ARRAYS.

This is what it created.

ARRAY of STRUCTS:

STRUCT of ARRAYS:

When it comes to this table- what is the 'correct' or 'optimal' way of storing this data?

I assume that if purchases is a collection of information about purchases (which product was bought, quantity and price) then we should use STRUCT of ARRAYS here, to 'group' data about purchases. Meaning, purchases would be the STRUCT and product_names, prices, quantities would be ARRAYS of data.

In such example- is it even logical to use ARRAY of STRUCTS? What if purchases was an ARRAY of STRUCTS inside. It doesn't really make sense to me here.

This is the data in both of them:

I guess ChatGPT brought up a good point:

"Each purchase is an independent entity with a set of associated attributes (e.g., product name, price, quantity). You are modeling multiple purchases, and each purchase should have its attributes grouped together. This is precisely what an Array of Structs does—it groups the attributes for each item in a neat, self-contained way.

If you use a Struct of Arrays, you are separating the attributes (product name, price, quantity) into distinct arrays, and you have to rely on index alignment to match them correctly. This is less intuitive for this case and can introduce complexities and potential errors in querying."

Top answer
1 of 5
3
Fun question. I think chatgpt is right. When it comes to extracting the data an array of structs will ensure a single struct within the array has the correct data sitting "on the same line". There could be use cases where null values in an array may be ignored or break iteration based processing if they aren't handled properly. It also has the potential for population to put things in the wrong order if it's not specified. No idea if one way is faster than another, though. Id feel doing a struct of arrays would have to be a lot faster for me to consider doing it that way, and if I did I'd try to keep it isolated from anything else.
2 of 5
3
ChatGPT is totally correct (in this case). An “array of structs” is an intrinsically useful structure. As an exercise, consider each struct as a form, like an employment application, and an array as a folder. Putting a bunch of structs into an array makes total sense, since you’re keeping each application intact, and grouping them. That’s why an “Array of Structs” is the most common user-enumerated data structure you’ll find. It’s even more useful when you consider that you can order the records within a given array, so you can enable logical assumptions about the first element being the earliest, smallest…whatever. A “Struct of Arrays”, though, is like taking each form, cutting it apart into its component fields…and then putting all the name slips into one container, all the address slips into a different container, and all the previous employment slips in a third. You’d need to have a whole separate system in place just to keep track of exactly which slips of paper came from which original forms. That’s what ChatGPT is referring to as “index alignment”. So one variation has clear implications that make it useful and common, while the other has implications that make it complicated, easy to screw up, and rare. That doesn’t mean that you’d never want to create a struct of arrays, but you’d want to have a very specific logical reason why that’s the most appropriate structure. Arrays of structs, however, are basically a “best-practice pattern” in BigQuery.
🌐
GitHub
github.com › jax-ml › jax › discussions › 7544
Array of structs vs struct of arrays · jax-ml/jax · Discussion #7544
That is, they define a vector as Vecs = collections.namedtuple('Vecs', ['x', 'y', 'z']), rather than as simple a jax array of size 3, and write code involving such objects accordingly; that is if we want to sum over components, we need to write that out and we cannot call a sum method over an axis. What we all want ofc is to be able to write high level expressive code, that will compile to something close to optimal on as many backends as possible. Its known that GPUs at least prefer a struct-of-arrays memory layout; as that will allow threads in a warp to perform the same operation, while their memory access patterns will coalesce.
Author   jax-ml
🌐
Julia Programming Language
discourse.julialang.org › general usage › performance
Array of Structs vs Structs of Arrays - Performance - Julia Programming Language
November 4, 2024 - I am developing numerical simulator and need to store variables (temporarily) for each iterations. Those variables change in each iterations. But I am not sure what kind of storing method I should use to have faster processing. (If I use Array of Structs, Sturcts should be mutable type of structs) ...
🌐
NVIDIA Developer Forums
forums.developer.nvidia.com › accelerated computing › cuda › cuda programming and performance
Structures of Arrays vs Arrays of Structures? - CUDA Programming and Performance - NVIDIA Developer Forums
December 2, 2009 - Structures of arrays are preferred to arrays of structures because the data is arranged such that each thread (in a half warp) can access data in the same area of memory at once, otherwise known as coalescing, as we trying to avoid having two ...
Find elsewhere
🌐
Unity
discussions.unity.com › unity engine
Whats best? Array of Structs OR Struct of Arrays - Unity Engine - Unity Discussions
June 15, 2015 - Hi Generally does it matter for the performance if I use Array of Structs OR Struct of Arrays ? /Thomas
🌐
Bearblog
hwisnu.bearblog.dev › array-of-structs-and-struct-of-arrays
Array of Structs and Struct of Arrays
September 27, 2024 - Array is allocated on the stack, and no dynamic memory allocation is required · An SoA is a data structure consisting of a single structure that contains multiple arrays, each representing a single field or member of an entity.
🌐
Barry’s C++ Blog
brevzin.github.io › c++ › 2025 › 05 › 02 › soa
Implementing a Struct of Arrays | Barry's C++ Blog
May 2, 2025 - We’re going to write a SoaVector<T> that instead being a dynamic array of Ts has one dynamic array for each non-static data member of T. For the purposes of this post, we’re going to pick a simple type that has two members of different types. Let’s say… a chess coordinate: ... But we’re writing an SoaVector<Point>, which means we want to store the xs and ys separately. Now, we could be lazy and do this: struct { std::vector<char> x; std::vector<int> y; };
🌐
Julia Programming Language
discourse.julialang.org › general usage › performance
Struct of Arrays (SoA) vs Array of Structs (AoS) - Performance - Julia Programming Language
October 17, 2019 - There are two ways to arrange data in memory. SoA means Struct of Arrays and AoS means Array of Structs. I’m trying to understand when one is better than the other in Julia. I was reading @ChrisRackauckas fantastic lect…
🌐
Hacker News
news.ycombinator.com › item
Implementing a Struct of Arrays | Hacker News
May 10, 2025 - The explanation drifts into thinking of 2D byte arrays as 3D bit matrices, but in the end it was a 20-30x improvement in speed and binary size · I was honestly surprised that C++ doesn't have anything built in for this, but at least it's trivial to write your own array type
🌐
Hacker News
news.ycombinator.com › item
Use a structure-of-arrays rather than an ...
September 15, 2018 - Could you please explain this in more detail · And that also allows you to reference deleted nodes
🌐
GeeksforGeeks
geeksforgeeks.org › c++ › array-of-structures-vs-array-within-structure-in-cpp
Array of Structures vs Array within a Structure in C++ - GeeksforGeeks
July 23, 2025 - The below table demonstrates the ... structures: In conclusion, if you're dealing with many items that are alike and you want to work with them individually, go for an array of structures....
🌐
Quora
quora.com › How-do-you-C-C-define-an-array-of-structures-or-a-structure-of-arrays
How to C/C++ define an array of structures or a structure of arrays - Quora
Answer: C++ has object support. You can define a struct and make an array of that type with a simple declaration. All elements will be automatically allocated and default initialized. If the struct or class has code to run in the constructor each item will complete its constructor in order. Plai...
🌐
Javatpoint
javatpoint.com › array-of-structures-in-c
Array of Structures in C - javatpoint
Array of Structures in C with programming examples for beginners and professionals covering concepts, control statements. Let's see an example of structure with array in C.
🌐
MathWorks
mathworks.com › matlabcentral › answers › 1463839-array-of-structures-to-structure-of-arrays
Array of Structures to Structure of arrays - MATLAB Answers - MATLAB Central
September 30, 2021 - Where the fieldname is referencing a array of 1xM. When I try to call var = struct.fieldname or var = struct(:).fieldname I only get one value, while I would like to get all values.
🌐
ResearchGate
researchgate.net › figure › Comparison-of-Struct-of-Arrays-SoA-and-Array-of-Structs-AoS-memory-layouts-for_fig2_349682483
Comparison of Struct of Arrays (SoA) and Array of Structs (AoS) memory... | Download Scientific Diagram
The AoS stores multiple instances of a struct in an array such that they are contiguous in memory. The SoA layout, on the other hand, stores multiple instances of a struct in a single struct with arrays for each structure member.
🌐
Reddit
reddit.com › r/programming › aos vs. soa: array of structures versus structure of arrays
r/programming on Reddit: AoS vs. SoA: Array of Structures versus Structure of Arrays
January 22, 2015 - Why is SOA (Structures of Arrays) faster than AOS? r/C_Programming • · r/C_Programming · The subreddit for the C programming language · Members · upvotes · · comments · Memory layout of struct vs array · r/cpp • · r/cpp · Discussions, articles and news about the C++ programming language or programming in C++. Members ·
🌐
Quora
quora.com › How-does-a-structure-differ-from-an-array-in-c-programming
How does a structure differ from an array in c programming? - Quora
Answer (1 of 4): Difference Between Array and Structure * Array is a collection of related data elements of same type. Structure can have elements of different types * Array is a derived data type.
🌐
Pretagteam
ww25.pretagteam.com › question › array-of-structures-or-structure-of-arrays
Pretagteam
October 27, 2021 - We cannot provide a description for this page right now