This is a broad topic but I'll give a few thoughts.

First off Spectrum is a (often large) set of compute elements embedded in S3 that can do some aspect of the query plan. These part centered around applying WHERE conditions and performing aggregation (GROUP BY). There are also aspects of the query plan that cannot be perform in the S3 layer such as JOINs and advanced functions such as window functions.

The next thing to understand is that while these embedded compute elements are close to S3 in terms of access speed, the S3 service is far away from the Redshift cluster (network distance). If the large amount of data stored in S3 can be pared down to a small set that is shipped to Redshift then Spectrum can be a huge performance improvement. However, if the large amount of data stored in S3 needs to be moved to the Redshift cluster completely to perform the query then there can be a large hit to performance.

Spectrum can be a huge benefit; allowing for a very large amount of data to be filtered down quickly by a fleet of small compute elements. This can result in a big win in performance and in the amount of data that can be addressed.

With these in mind you will want to have data in Spectrum that your query plan will want to get a subset transferred from S3 to redshift. This in general will apply to your fact tables and not to your dim tables. However, if your queries aren't going to apply a WHERE clause to the fact table or aggregate the data down then you won't see the advantages. Also for this to work the WHERE clause needs to apply to a column in the fact table as JOINs cannot be done in S3 so filtering on dim columns won't help. Similarly and GROUP BY needs to be applied only on the fact table columns or this won't reduce the data coming to Redshift from S3.

So fact tables.

Data generally gets into Redshift through S3 and this can be done with the COPY command. You can also get data into Redshift from S3 using Spectrum. This can be a useful tool if other tools are also using S3 for this shared data. S3 can seem like a common data store for separate data systems. This can be useful for some data solutions.

You also bring up very large, infrequently used data. Like older historical data that is usually needed but is sometimes needed. This can be helpful in that older data can be offloaded from the Redshift cluster and the access time for this data isn't important as it is very infrequently used. There is a potential issue - The Redshift cluster can only work on a certain size of data given it's disk space and memory. So you can clog up your cluster if the amount of historical data is too large. This may mean that looking at the full set of historical data in one query may not be possible. Again if the data is aggregated or filtered in S3 this issue isn't a problem.

Bottom line - Spectrum is a great tool but isn't the right tool for every problem.

Answer from Bill Weiner on Stack Overflow
🌐
Hevo
hevodata.com › home › learn › data warehousing
Amazon Redshift vs Redshift Spectrum: 6 Differences in 2025
January 12, 2026 - It offers more functionality and efficiency when compared to the Redshift analytical tool. Integrate your Source to Redshift Effortlessly! ... Redshift architecture consists of two or more Computing Nodes that are connected to a Leader Node.
🌐
GitHub
aws-samples.github.io › aws-dbs-refarch-edw › src › spectrum-multicluster
Multi-Warehouse Architecture leveraging Redshift Spectrum | aws-dbs-refarch-edw
Redshift Spectrum enables you to run Amazon Redshift SQL queries against exabytes of data in Amazon S3, and supports both structured and unstructured data. In this architecture, we’ll see how we can leverage Redshift Spectrum to enable scale-out of unlimited size with a single unified endpoint.
People also ask

When should I use Redshift spectrum?
Amazon Redshift Spectrum is an extension of Amazon Redshift that allows you to run queries against data stored in Amazon S3 without having to load the data into Redshift tables.
🌐
hevodata.com
hevodata.com › home › learn › data warehousing
Amazon Redshift vs Redshift Spectrum: 6 Differences in 2025
How do you use Redshift a spectrum?
Set up an IAM role, create an external schema, define external tables, and query the external tables using SQL commands.
🌐
hevodata.com
hevodata.com › home › learn › data warehousing
Amazon Redshift vs Redshift Spectrum: 6 Differences in 2025
What is the Redshift spectrum layer?
The Redshift Spectrum Layer refers to the architecture component of Amazon Redshift that enables querying data directly from Amazon S3.
🌐
hevodata.com
hevodata.com › home › learn › data warehousing
Amazon Redshift vs Redshift Spectrum: 6 Differences in 2025
🌐
Hevodata
cdn.hevodata.com › whitepapers › A Complete Guide On Amazon Spectrum.pdf pdf
A COMPLETE GUIDE ON REDSHIFT SPECTRUM Redshift Spectrum
In the image below, Spectrum layer is highlighted inside dotted lines · between Redshift cluster and S3. Architecture of the Data Flow in Spectrum: Redshift client <-> Leader Nodes <-> Compute Nodes <-> ​Spectrum · Layer ​<-> S3 <-> Data catalog · 3 · Redshift Spectrum Pricing ·
🌐
AWS
aws.amazon.com › blogs › big-data › amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required
Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required | Amazon Web Services
February 15, 2021 - Redshift Spectrum’s architecture offers several advantages. First, it elastically scales compute resources separately from the storage layer in Amazon S3. Second, it offers significantly higher concurrency because you can run multiple Amazon Redshift clusters and query the same data in Amazon S3.
Top answer
1 of 2
9

This is a broad topic but I'll give a few thoughts.

First off Spectrum is a (often large) set of compute elements embedded in S3 that can do some aspect of the query plan. These part centered around applying WHERE conditions and performing aggregation (GROUP BY). There are also aspects of the query plan that cannot be perform in the S3 layer such as JOINs and advanced functions such as window functions.

The next thing to understand is that while these embedded compute elements are close to S3 in terms of access speed, the S3 service is far away from the Redshift cluster (network distance). If the large amount of data stored in S3 can be pared down to a small set that is shipped to Redshift then Spectrum can be a huge performance improvement. However, if the large amount of data stored in S3 needs to be moved to the Redshift cluster completely to perform the query then there can be a large hit to performance.

Spectrum can be a huge benefit; allowing for a very large amount of data to be filtered down quickly by a fleet of small compute elements. This can result in a big win in performance and in the amount of data that can be addressed.

With these in mind you will want to have data in Spectrum that your query plan will want to get a subset transferred from S3 to redshift. This in general will apply to your fact tables and not to your dim tables. However, if your queries aren't going to apply a WHERE clause to the fact table or aggregate the data down then you won't see the advantages. Also for this to work the WHERE clause needs to apply to a column in the fact table as JOINs cannot be done in S3 so filtering on dim columns won't help. Similarly and GROUP BY needs to be applied only on the fact table columns or this won't reduce the data coming to Redshift from S3.

So fact tables.

Data generally gets into Redshift through S3 and this can be done with the COPY command. You can also get data into Redshift from S3 using Spectrum. This can be a useful tool if other tools are also using S3 for this shared data. S3 can seem like a common data store for separate data systems. This can be useful for some data solutions.

You also bring up very large, infrequently used data. Like older historical data that is usually needed but is sometimes needed. This can be helpful in that older data can be offloaded from the Redshift cluster and the access time for this data isn't important as it is very infrequently used. There is a potential issue - The Redshift cluster can only work on a certain size of data given it's disk space and memory. So you can clog up your cluster if the amount of historical data is too large. This may mean that looking at the full set of historical data in one query may not be possible. Again if the data is aggregated or filtered in S3 this issue isn't a problem.

Bottom line - Spectrum is a great tool but isn't the right tool for every problem.

2 of 2
2

In general, put everything into 'normal' Amazon Redshift.

Redshift Spectrum is handy for accessing data stored in Amazon S3 without having to load it into the Redshift cluster, but it will not be as fast as accessing data stored in 'normal' Redshift.

Therefore, it is useful for rarely-accessed data or for one-off queries on a dataset without having to import the data into Redshift.

Do not use Spectrum as part of your normal ETL flow. One exception to this might be if you are receiving 'landing' data via Amazon S3 (eg Seed Files) -- rather than importing the tables into Redshift, they could be referenced via Spectrum. However, normal loading tools such as Fivetran can load the data directly into Redshift, which is preferable to using Spectrum.

🌐
AWS
docs.aws.amazon.com › amazon redshift › database developer guide › amazon redshift spectrum › amazon redshift spectrum overview
Amazon Redshift Spectrum overview - Amazon Redshift
Amazon Redshift Spectrum resides on dedicated Amazon Redshift servers that are independent of your cluster. Amazon Redshift pushes many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer. Thus, Redshift Spectrum queries use much less of ...
🌐
CloudThat
cloudthat.com › home › blogs › integrating amazon redshift spectrum with lake house architectures
Integrating Amazon Redshift Spectrum with Lake House Architectures
June 16, 2025 - In the modern data landscape, the Lake House architecture is emerging as a powerful pattern that combines the scalability and flexibility of a data lake with the performance and schema management of a data warehouse. One of the key services enabling this hybrid model on AWS is Amazon Redshift Spectrum.
Find elsewhere
🌐
AWS
docs.aws.amazon.com › amazon redshift › database developer guide › amazon redshift spectrum
Amazon Redshift Spectrum - Amazon Redshift
Using Amazon Redshift Spectrum, you can efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to run very fast against large datasets.
🌐
Airbyte
airbyte.com › data integration platform › data engineering resources › aws redshift architecture: 5 important components
AWS Redshift Architecture: 5 Important Components | Airbyte
September 9, 2025 - Spectrum extends Redshift's compute capabilities to your data lake, creating a unified query layer across structured and unstructured data. This reduces data-movement costs while enabling complex analytics across diverse data sources.
🌐
Integrate.io
integrate.io › home › blog › big data integration › aws redshift architecture: clusters & nodes & data apps
AWS Redshift Architecture: Clusters & Nodes & Data Apps | Integrate.io
July 21, 2025 - The cost of S3 storage is roughly a tenth of Redshift compute nodes. With Amazon Redshift Spectrum you can query data in Amazon S3 without first loading it into Amazon Redshift. Image 2 shows what an extended Architecture with Spectrum and query caching looks like.
🌐
AWS
aws.amazon.com › blogs › big-data › tag › amazon-redshift-spectrum
Amazon Redshift Spectrum | AWS Big Data Blog
Redshift Spectrum enables you to power a lake house architecture to directly query and join data across your data warehouse and data lake, and Concurrency Scaling enables you to support thousands of concurrent users and queries with consistently fast query performance.
🌐
Medium
blog.wahab2.com › implementing-a-data-mesh-architecture-with-aws-redshift-spectrum-and-lake-formation-1ef0243e567e
Implementing a Data Mesh Architecture with AWS Redshift Spectrum and Lake Formation | by Abdul Rafee Wahab | Medium
August 15, 2023 - Redshift Spectrum allows you to query data stored in S3 directly from your Redshift cluster, while Lake Formation simplifies data lake management and access control. By combining these services, you can achieve a flexible and scalable Data Mesh ...
🌐
Whizlabs
whizlabs.com › home › what is amazon redshift and how does it work?
What Is Amazon Redshift and How Does It Work?
April 28, 2025 - Redshift can refer to tables from Redshift Spectrum and can refer to data catalogs from Amazon Glue, Athena, or EMR. Here are the key features that distinguish Redshift from other data warehouse solutions. Massively Parallel Processing: Able to run complex queries in parallel · Shared nothing architecture: Compute nodes with independent compute resources ensure no two nodes share the same data
🌐
Medium
meriemterki.medium.com › understanding-the-differences-rms-spectrum-and-federated-query-in-amazon-redshift-32d9a99859a6
Understanding the Differences: RMS, Spectrum, and Federated Query in Amazon Redshift | by Meriem Terki | Medium
May 22, 2024 - RMS offers scalable and cost-effective storage management within Redshift. Spectrum enables high-performance querying of data stored in Amazon S3, integrating seamlessly with data lake architectures.
🌐
Informatixweb
informatixweb.com › knowledgebase › 3390 › Redshift-Spectrum-Configuration.html
Redshift Spectrum Configuration - Knowledgebase - INFORMATIX WEB
Queries are issued using SQL from within Redshift. Data within Redshift clusters can be queried alongside external tables using the same syntax. ... S3 serves as the storage layer for data. Redshift Spectrum reads directly from S3, allowing for queries across structured and semi-structured data.
🌐
Amazon Web Services
pages.awscloud.com › rs › 112-TZM-766 › images › Session 4 - Day 2 Amazon Redshift Overview and Architecture.pdf pdf
Amazon Redshift
nothing columnar architecture · Leader node · • · SQL endpoint · • · Stores metadata · • · Coordinates parallel SQL processing · Compute nodes · • · Local, columnar storage · • · Executes queries in parallel · • · Load, unload, backup, restore · Amazon Redshift Spectrum nodes ·
🌐
AtScale
atscale.com › home › blog › data lake intelligence with amazon redshift spectrum and atscale
Data Lake Intelligence with Amazon Redshift Spectrum and AtScale | AtScale
December 2, 2022 - The Amazon data lake architecture that includes Amazon S3, Amazon Spectrum, and Amazon Redshift provides an affordable, elastic, and fully managed data lake solution. The robust architecture provides the ability to store, access, and analyze ...
🌐
AWS
docs.aws.amazon.com › amazon redshift › database developer guide › amazon redshift spectrum › getting started with amazon redshift spectrum
Getting started with Amazon Redshift Spectrum - Amazon Redshift
QUERY PLAN ----------------------------------------------------------------------------- XN Limit (cost=1001055770628.63..1001055770628.65 rows=10 width=31) -> XN Merge (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Merge Key: sum(sales.derived_col2) -> XN Network (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Send to leader -> XN Sort (cost=1001055770628.63..1001055770629.13 rows=200 width=31) Sort Key: sum(sales.derived_col2) -> XN HashAggregate (cost=1055770620.49..1055770620.99 rows=200 width=31) -> XN Hash Join DS_BCAST_INNER (cost=3119.97..1055769620.49 rows=200
🌐
TechTarget
techtarget.com › searchaws › definition › Amazon-Redshift-Spectrum
What is Amazon Redshift Spectrum? | Definition from TechTarget
Learn how Amazon Redshift Spectrum extends the functionality of Amazon Redshift data warehouses to perform queries on large amounts of unstructured data within S3.
🌐
Upsolver
upsolver.com › home › blog › aws serverless showdown: redshift spectrum or athena – which should you choose?
Redshift Spectrum vs. Athena: Choose the Right Serverless | Upsolver
May 28, 2024 - Spectrum enables you to query data stored on Amazon S3 using SQL, and to run the same queries on tabular data stored in your Redshift cluster and data stored in S3 – all using the Redshift SQL query editor.