๐ŸŒ
AWS
docs.aws.amazon.com โ€บ amazon redshift โ€บ database developer guide โ€บ automatic table optimization
Automatic table optimization - Amazon Redshift
Automatic table optimization continuously observes how queries interact with tables. It uses advanced artificial intelligence methods to choose sort and distribution keys to optimize performance for the cluster's workload. If Amazon Redshift determines that applying a key improves cluster ...
๐ŸŒ
ProsperOps
prosperops.com โ€บ home โ€บ amazon redshift optimization: 12 tuning techniques to boost performance
Amazon Redshift Optimization: 12 Tuning Techniques To Boost Performance - ProsperOps
September 19, 2024 - More memory and faster CPUs lead to better performance optimization but at a higher cost. ... Workload type: Analyze the nature and demands of your workload. Memory needs: More complex queries require more memory. Storage requirements: Gauge the data volume to store internally vs. externally on S3. Tip: Evaluate whether the benefits of RA3 nodesโ€˜ scalability justify the additional expense for your specific use case. When setting up large tables in Amazon Redshift, choosing an optimal sort key is crucial for enhancing query performance.
Discussions

c# - Improving performance reading from large Redshift table - Stack Overflow
In what ways can I improve reading the entirety of a large table (> 100 mil rows) on Redshift? I have a dotnet program accessing data from a large table and SELECT * is taking around 2 hours to ... More on stackoverflow.com
๐ŸŒ stackoverflow.com
Optimize My Redshift SQL
Is it faster if you rank your income across all of raw_cache (so you only ever have to order them once instead of doing it over and over in each query_1, query_2...) and then for every dimension and each 'All' cut of data you count how many records you have and use math to get at the 90th and 99th percentile? And I guess some extra work to replicate the interpolating that percentile_cont does. More on reddit.com
๐ŸŒ r/SQL
7
4
December 24, 2023
What can I do about redshift slowness?
You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns. More on reddit.com
๐ŸŒ r/dataengineering
28
43
June 5, 2023
Optimizing dist and sort keys in Redshift
Hey fellow Redshift user. When you join two tables with a shared disk key, the performance is better. And yes if you had WHERE clauses on the sort column, the performance will decrease. However, there are many cases where sort keys don't improve performance (depending on the encoding of the sort column, whether you have joins). If you want a detailed (and correct) answer, I would recommend asking on Stack Overflow to Bill Weiner . He is a Redshift expert and is quite active on SO. And the best would be that you share your SQL queries, it helps a lot for performance assessment. Have fun with Redshift More on reddit.com
๐ŸŒ r/dataengineering
2
3
August 11, 2022
๐ŸŒ
AWS
docs.aws.amazon.com โ€บ amazon redshift โ€บ database developer guide โ€บ query performance tuning
Query performance tuning - Amazon Redshift
March 19, 2026 - Once your system is set up, you typically work with DML the most, especially the SELECT command for retrieving and viewing data. To write effective data retrieval queries in Amazon Redshift, become familiar with SELECT and apply the tips outlined in Amazon Redshift best practices for designing tables to maximize query efficiency.
๐ŸŒ
AWS
aws.amazon.com โ€บ blogs โ€บ big-data โ€บ top-10-performance-tuning-techniques-for-amazon-redshift
Top 10 performance tuning techniques for Amazon Redshift | Amazon Web Services
April 20, 2022 - If tables that are frequently accessed with complex patterns are missing statistics, Amazon Redshift Advisor creates a critical recommendation to run ANALYZE. If tables that are frequently accessed with complex patterns have out-of-date statistics, Advisor creates a suggested recommendation to run ANALYZE. The following screenshot shows a table statistics recommendation. Auto WLM simplifies workload management and maximizes query throughput by using ML to dynamically manage memory and concurrency, which ensures optimal utilization of the cluster resources
๐ŸŒ
Integrate.io
integrate.io โ€บ home โ€บ blog โ€บ big data โ€บ 15 performance tuning techniques for amazon redshift
15 Performance Tuning Techniques for Amazon Redshift | Integrate.io
November 25, 2025 - As a best practice, we recommend running ANALYZE on any tables with a โ€œstats offโ€ percentage greater than 10%. Amazon Redshift is a distributed, shared-nothing database that scales horizontally across multiple nodes. Query execution time is very tightly correlated with: ... Below is an example of a poorly written query, and two optimizations to make it run faster.
๐ŸŒ
E6data
e6data.com โ€บ query-and-cost-optimization-hub โ€บ how-to-optimize-aws-redshift-queries
AWS Redshift Query Optimization Guide 2025: 15 Code Hacks and Examples
September 16, 2025 - Each technique includes specific implementation thresholds (validated across large datasets), complete runnable SQL examples, and clear guidance on when to apply them. ... Redshift performs optimally when related data is co-located on identical compute nodes, eliminating expensive cross-node data movement. Implementation example for sales analytics dashboards: 1-- Original table with default distribution 2CREATE TABLE sales_facts ( 3 sale_id BIGINT, 4 customer_id BIGINT, 5 product_id BIGINT, 6 sale_date DATE, 7 amount DECIMAL(10,2) 8) DISTSTYLE AUTO; 9 10-- Optimized version with strategic dis
๐ŸŒ
Dwgeek
dwgeek.com โ€บ home โ€บ optimize redshift table design to improve performance
Optimize Redshift Table Design to Improve Performance - DWgeek.com
January 31, 2023 - For more details Read: How to Select Redshift Sort Key- Choose Best Sort Key ยท Use โ€œCreate Table ASโ€ method whenever you are re-creating tables. With CTAS option, data distributes on the data slices without involving leader node, hence ...
๐ŸŒ
AWS
aws.amazon.com โ€บ blogs โ€บ big-data โ€บ automate-your-amazon-redshift-performance-tuning-with-automatic-table-optimization
Automate your Amazon Redshift performance tuning with automatic table optimization | Amazon Web Services
October 6, 2021 - You can find more details of this process in the scientific paper Fast and Effective Distribution-Key Recommendation for Amazon Redshift. For sort keys, a tableโ€™s queries are monitored for columns that are frequently used in filter and join predicates. A column is then chosen based on the frequency and selectivity of those predicates. When an optimal configuration is found, ATO implements the new keys in the background, redistributing rows across the cluster and sorting tables.
Find elsewhere
๐ŸŒ
Airbyte
airbyte.com โ€บ blog โ€บ optimize-redshift-performance-and-reduce-costs
How to optimize Redshift performance and reduce costs | Airbyte
November 18, 2022 - Similarly, if you perform frequent joins on a particular table, the join column should be specified as the distribution key. Amazon Redshift also uses machine learning models to automatically optimize your tables.
๐ŸŒ
ChaosGenius
chaosgenius.io โ€บ blog โ€บ optimizing-redshift-performance
10 Query Optimization Tips for Faster Redshift Performance
December 11, 2025 - Redshift will store your data in sorted order if you define sort keys for your tables. Choosing columns as sort keys commonly utilized in JOIN conditions or WHERE clauses is crucial. Queries can be executed more quickly because compression lowers the amount of storage and I/O needed. Redshift offers automatic compression, but you may additionally specify the column-specific compression encodings. Try out various compression techniques to discover the best compromise between storage and performance.
๐ŸŒ
Matillion
matillion.com โ€บ uploads โ€บ pdf โ€บ optimizing-amazon-redshift.pdf pdf
Optimizing Amazon Redshift A REAL-WORLD GUIDE
Define table distribution styles ... to data loads by anticipating typical access ยท paths for the table in question. Choose the best sort key, optimizing first for joins and then for filtering....
๐ŸŒ
Eyer
eyer.ai โ€บ blog โ€บ 12-amazon-redshift-query-optimization-techniques
12 Amazon Redshift Query Optimization Techniques
October 9, 2024 - One user queried a 443,744-row table 374,372 times. Result? 125 minutes of query time due to queuing. ... Use Redshift Advisor. It watches your cluster and suggests ways to boost efficiency and cut costs. ... Pick the right sort key and distribution style. This can make your queries WAY faster. ... Use automatic compression. It saves space AND speeds up I/O. ... They help the query optimizer make smarter plans.
๐ŸŒ
AWS
docs.aws.amazon.com โ€บ amazon redshift โ€บ database developer guide โ€บ automatic table optimization โ€บ enabling, disabling, and monitoring automatic table optimization
Enabling, disabling, and monitoring automatic table optimization - Amazon Redshift
Initially, a table has no distribution key or sort key. The distribution style is set to either EVEN or ALL depending on table size. As the table grows in size, Amazon Redshift applies the optimal distribution keys and sort keys. Optimizations are applied within hours after a minimum number ...
๐ŸŒ
Flexera
flexera.com โ€บ blog โ€บ finops โ€บ optimizing-redshift-performance
10 SQL query optimization tips for faster Redshift performance (2026)
January 27, 2026 - Redshift will store your data in sorted order if you define sort keys for your tables. Choosing columns as sort keys commonly utilized in JOIN conditions or WHERE clauses is crucial. Queries can be executed more quickly because compression lowers the amount of storage and I/O needed. Redshift offers automatic compression, but you may additionally specify the column-specific compression encodings. Try out various compression techniques to discover the best compromise between storage and performance.
๐ŸŒ
Medium
medium.com โ€บ @madhuyengala โ€บ designing-efficient-redshift-tables-a-guide-to-performance-scalability-2cc1356a1a27
Designing Efficient Redshift Tables: A Guide to Performance & Scalability | by Madhuyengala | Medium
March 20, 2025 - Query Execution Plan: Use the EXPLAIN command to analyze and optimize query execution plans. Based on the results, you can adjust your sort keys, distribution strategy, or even query logic. Vacuuming: Regularly vacuum tables to reclaim space and reorganize data for efficient query processing, especially after large data deletions or updates. Analyzing: Use the ANALYZE command to update the statistics about your data distribution. This helps Redshift make better decisions about query execution plans.
๐ŸŒ
Hevo
hevodata.com โ€บ home โ€บ learn โ€บ amazon redshift performance tuning: 4 best techniques
Amazon Redshift Performance Tuning: 4 Best Techniques - Learn | Hevo
December 29, 2022 - When Amazon Redshift organizes ... Amazon Redshift to ignore entire blocks of data that do not suffice to your filtering/predicate range. Before executing any query, the optimizer redistributes the rows to the compute nodes to perform joins, aggregations, and processing. A few techniques for optimal distribution style are listed below: To minimize the impact of this redistribution, you must designate the primary key of the dimension table ( and the ...
๐ŸŒ
Medium
medium.com โ€บ @opcfrance โ€บ mastering-sql-query-optimization-in-amazon-redshift-40d3b0ba1726
Mastering SQL Query Optimization in Amazon Redshift | by opcfrance | Medium
July 16, 2024 - Optimizing SQL queries in Amazon Redshift involves understanding and leveraging indexes, execution plans, precise SELECT statements, efficient joins, minimizing subqueries, and using stored procedures and CTEs.
๐ŸŒ
LinkedIn
linkedin.com โ€บ all โ€บ engineering โ€บ data warehousing
How can you optimize query performance when using Amazon Redshift?
November 24, 2023 - Identify the table and columns usage for each operation along with row volume and data size etc. 2) Workload Management (WLM): Configure WLM to prioritize critical queries and manage system resources effectively.
๐ŸŒ
Hevo
hevodata.com โ€บ home โ€บ learn โ€บ data warehousing
Amazon Redshift Performance Tuning: 4 Best Techniques
January 10, 2026 - When Amazon Redshift organizes ... Amazon Redshift to ignore entire blocks of data that do not suffice to your filtering/predicate range. Before executing any query, the optimizer redistributes the rows to the compute nodes to perform joins, aggregations, and processing. A few techniques for optimal distribution style are listed below: To minimize the impact of this redistribution, you must designate the primary key of the dimension table ( and the ...
Top answer
1 of 2
1

There are a number of things you can do (depending on what you are trying to do which you haven't explained):

  1. Don't read all the columns (I expect you have thought of this).
  2. Make sure the data is compressed (encoded).
  3. Ensure you data isn't badly skewed (i.e. most of your data is on one slice)
  4. Allocate more memory to the query reading all this data. I expect that there is quite a bit of spill to disk, reducing this could have a big impact.
  5. Increase the number / size of nodes in your cluster. The disk bandwidth is directly proportional to the number of nodes.
  6. Use Redshift Spectrum to do the initial paring down of data. If you are doing group by / aggregation of the data then Spectrum can greatly increase the bandwidth for performing these initial actions of your query. This is only a win if you are not moving all the data to the Redshift cluster.

With all the said I am doubtful that you are really having issues with disk reads for only 100M rows. This is peanuts for Redshift. Unless you have 1000 columns and a tiny cluster this won't take 2 hours. Did you do a SELECT * with the result landing on your computer? If so the 2 hours was moving the data to you over the network, not reading it from disk.

I hope the suggestions above help but if my guess is correct and there is something wrong with your measurements you will need to provide more information. How large in GB is the table? How big is the cluster? What queries are you running? Table info like skew and compression. Query actual execution timing. Something seems amiss.

2 of 2
0

I now understand that the speed in question is pulling the data down to an EC2 instance. There are ways to speed this up as well.

The issue you are running into is that you are moving all the data through a single network connection. The issue with this is that a single network connection has a lot of handshake overhead and since Redshift requires a fairly small network MTU (packet size) there is a lot of handshaking. In addition the data is send uncompressed over the JDBC connection which takes more bandwidth than compressed data. So even though you can bringing the data to a single computer (ec2) there is significant speed up that can be done.

So if the question is how can I speed up the data coming from Redshift over the JDBC connection, I'm sorry you can't do much (high network speed ec2?). If instead you want to get the data to the ec2 the fastest there are improvements that can be made.

Believe it or not the fastest way is a 2 step approach. First unload the data to S3 and make sure it is compressed and with "parallel on". This will cause Redshift to start a data transfer from each slice to S3 - in your case 4 parallel connections. (If you had a bigger cluster the parallelism would be even higher.) Now you will have at least 4 files in S3.

Next you start parallel gets of these files from the ec2. You want around 4 parallel gets so this could work simply in your case. A bash script can be used to automate the process of having 4 parallel AWS CLI gets of the data running at all times (if you have more than 4 files). When each file is download you want to uncompress them and this can be done on the fly - "aws s3 cp s3://bucket/key - | gunzip -c > file". Last step is to cat these files together (if you need) and read them into whatever tool needs the data.

Because there is a lot of overhead in tcp connections and we have overlapping reads from S3, and the files are compressed this 2-step process can be significantly faster than the 1-step JDBC connection route for pulling large amounts of data from Redshift. The limiting step is likely the single network card of the ec2 but this process can maximize the performance of this limited resource.