redshift table optimization - Brave Search

docs.aws.amazon.com › amazon redshift › database developer guide › automatic table optimization

Automatic table optimization - Amazon Redshift

Automatic table optimization is a self-tuning capability in Amazon Redshift that automatically optimizes the design of tables by applying sort and distribution keys.

aws.amazon.com › blogs › big-data › automate-your-amazon-redshift-performance-tuning-with-automatic-table-optimization

Automate your Amazon Redshift performance tuning with automatic table optimization | Amazon Web Services

October 6, 2021 - Amazon Redshift automatically monitors the workload on the cluster and uses AI algorithms to calculate the optimal sort and distribution keys. Then ATO implements the table changes online, without disrupting running queries.

Discussions

c# - Improving performance reading from large Redshift table - Stack Overflow

In what ways can I improve reading the entirety of a large table (> 100 mil rows) on Redshift? I have a dotnet program accessing data from a large table and SELECT * is taking around 2 hours to ... More on stackoverflow.com

stackoverflow.com

Redshift Automatic Table Optimization and table swaps

A customer swaps out table nightly. For example table_1 built last night. Then tonight: - table_1_tmp is built - table_1 renamed to table_1_stale - table_1_tmp renamed to table_1 - table... More on repost.aws

repost.aws

1

0

December 16, 2020

sorting - Which sortkey is Redshift's automatic table optimization using for a table with auto sortkey set? - Stack Overflow

When creating a table on Redshift it's possible to specify (a) column(s) to use as sortkey(s). However, one can also set the sortkey to auto. For this case, Redshift comes with "automatic table More on stackoverflow.com

stackoverflow.com

Optimize My Redshift SQL

Is it faster if you rank your income across all of raw_cache (so you only ever have to order them once instead of doing it over and over in each query_1, query_2...) and then for every dimension and each 'All' cut of data you count how many records you have and use math to get at the 90th and 99th percentile? And I guess some extra work to replicate the interpolating that percentile_cont does. More on reddit.com

r/SQL

7

4

December 24, 2023

Videos

Mastering AWS Redshift: Optimizing and Reducing Costs - YouTube

January 18, 2024

Best Practices for Amazon Redshift Optimization - YouTube

October 20, 2021

Optimizing Your Amazon Redshift Cluster for Peak Performance - YouTube

November 21, 2018

Amazon Redshift Optimization - YouTube

November 6, 2020

AWS on Air 2020: AWS What’s Next ft. Redshift Table Optimization ...

December 31, 2020

AWS Redshift Query Tuning and Performance Optimization - YouTube

December 4, 2018

docs.aws.amazon.com › amazon redshift › database developer guide › automatic table optimization › enabling, disabling, and monitoring automatic table optimization

Enabling, disabling, and monitoring automatic table optimization - Amazon Redshift

Initially, a table has no distribution ... in size, Amazon Redshift applies the optimal distribution keys and sort keys. Optimizations are applied within hours after a minimum number of queries are run....

docs.aws.amazon.com › amazon redshift › database developer guide › query performance tuning

Query performance tuning - Amazon Redshift

March 19, 2026 - Once your system is set up, you typically work with DML the most, especially the SELECT command for retrieving and viewing data. To write effective data retrieval queries in Amazon Redshift, become familiar with SELECT and apply the tips outlined in Amazon Redshift best practices for designing tables to maximize query efficiency.

prosperops.com › home › amazon redshift optimization: 12 tuning techniques to boost performance

Amazon Redshift Optimization: 12 Tuning Techniques To Boost Performance - ProsperOps

September 19, 2024 - Here are some things to consider when selecting your distribution key: Data distribution: Choose a column with high cardinality as your distribution key to prevent data skewness. Table statistics: Monitor your table statistics regularly to verify ...

integrate.io › home › blog › big data › 15 performance tuning techniques for amazon redshift

15 Performance Tuning Techniques for Amazon Redshift | Integrate.io

November 25, 2025 - As a best practice, we recommend running ANALYZE on any tables with a “stats off” percentage greater than 10%. Amazon Redshift is a distributed, shared-nothing database that scales horizontally across multiple nodes. Query execution time is very tightly correlated with: ... Below is an example of a poorly written query, and two optimizations to make it run faster.

dwgeek.com › home › optimize redshift table design to improve performance

Optimize Redshift Table Design to Improve Performance - DWgeek.com

January 31, 2023 - For more details Read: How to Select Redshift Sort Key- Choose Best Sort Key · Use “Create Table AS” method whenever you are re-creating tables. With CTAS option, data distributes on the data slices without involving leader node, hence faster and it is the easier way. You should define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans.

aws.amazon.com › blogs › big-data › top-10-performance-tuning-techniques-for-amazon-redshift

Top 10 performance tuning techniques for Amazon Redshift | Amazon Web Services

April 20, 2022 - Advisor doesn’t provide recommendations when there isn’t enough data or the expected benefit of sorting is small. Amazon Redshift is optimized to reduce your storage footprint and improve query performance by using compression encodings.

Find elsewhere

Google Bing Mojeek

matillion.com › uploads › pdf › optimizing-amazon-redshift.pdf pdf

Optimizing Amazon Redshift A REAL-WORLD GUIDE

Define table distribution styles ... to data loads by anticipating typical access · paths for the table in question. Choose the best sort key, optimizing first for joins and then for filtering....

medium.com › @madhuyengala › designing-efficient-redshift-tables-a-guide-to-performance-scalability-2cc1356a1a27

Designing Efficient Redshift Tables: A Guide to Performance & Scalability | by Madhuyengala | Medium

March 20, 2025 - Query Execution Plan: Use the EXPLAIN command to analyze and optimize query execution plans. Based on the results, you can adjust your sort keys, distribution strategy, or even query logic. Vacuuming: Regularly vacuum tables to reclaim space and reorganize data for efficient query processing, especially after large data deletions or updates. Analyzing: Use the ANALYZE command to update the statistics about your data distribution. This helps Redshift make better decisions about query execution plans.

chaosgenius.io › blog › optimizing-redshift-performance

10 Query Optimization Tips for Faster Redshift Performance

December 11, 2025 - Choosing the best data distribution technique is one of the most important choices you must make when constructing your Redshift tables. There are four settings available in Redshift: AUTO, EVEN, KEY, and ALL. AUTO: This evenly distributes data ...

airbyte.com › blog › optimize-redshift-performance-and-reduce-costs

How to optimize Redshift performance and reduce costs | Airbyte

November 18, 2022 - Another way to optimize Redshift performance is by using sort keys and distribution keys. Sort keys determine the order in which rows in your Redshift table are stored. A table can have a single sort key or multiple sort keys (compound sort key).

stackoverflow.com › questions › 67133807 › improving-performance-reading-from-large-redshift-table

c# - Improving performance reading from large Redshift table - Stack Overflow

There are a number of things you can do (depending on what you are trying to do which you haven't explained):

Don't read all the columns (I expect you have thought of this).
Make sure the data is compressed (encoded).
Ensure you data isn't badly skewed (i.e. most of your data is on one slice)
Allocate more memory to the query reading all this data. I expect that there is quite a bit of spill to disk, reducing this could have a big impact.
Increase the number / size of nodes in your cluster. The disk bandwidth is directly proportional to the number of nodes.
Use Redshift Spectrum to do the initial paring down of data. If you are doing group by / aggregation of the data then Spectrum can greatly increase the bandwidth for performing these initial actions of your query. This is only a win if you are not moving all the data to the Redshift cluster.

With all the said I am doubtful that you are really having issues with disk reads for only 100M rows. This is peanuts for Redshift. Unless you have 1000 columns and a tiny cluster this won't take 2 hours. Did you do a SELECT * with the result landing on your computer? If so the 2 hours was moving the data to you over the network, not reading it from disk.

I hope the suggestions above help but if my guess is correct and there is something wrong with your measurements you will need to provide more information. How large in GB is the table? How big is the cluster? What queries are you running? Table info like skew and compression. Query actual execution timing. Something seems amiss.

I now understand that the speed in question is pulling the data down to an EC2 instance. There are ways to speed this up as well.

The issue you are running into is that you are moving all the data through a single network connection. The issue with this is that a single network connection has a lot of handshake overhead and since Redshift requires a fairly small network MTU (packet size) there is a lot of handshaking. In addition the data is send uncompressed over the JDBC connection which takes more bandwidth than compressed data. So even though you can bringing the data to a single computer (ec2) there is significant speed up that can be done.

So if the question is how can I speed up the data coming from Redshift over the JDBC connection, I'm sorry you can't do much (high network speed ec2?). If instead you want to get the data to the ec2 the fastest there are improvements that can be made.

Believe it or not the fastest way is a 2 step approach. First unload the data to S3 and make sure it is compressed and with "parallel on". This will cause Redshift to start a data transfer from each slice to S3 - in your case 4 parallel connections. (If you had a bigger cluster the parallelism would be even higher.) Now you will have at least 4 files in S3.

Next you start parallel gets of these files from the ec2. You want around 4 parallel gets so this could work simply in your case. A bash script can be used to automate the process of having 4 parallel AWS CLI gets of the data running at all times (if you have more than 4 files). When each file is download you want to uncompress them and this can be done on the fly - "aws s3 cp s3://bucket/key - | gunzip -c > file". Last step is to cat these files together (if you need) and read them into whatever tool needs the data.

Because there is a lot of overhead in tcp connections and we have overlapping reads from S3, and the files are compressed this 2-step process can be significantly faster than the 1-step JDBC connection route for pulling large amounts of data from Redshift. The limiting step is likely the single network card of the ec2 but this process can maximize the performance of this limited resource.

linkedin.com › all › engineering › data warehousing

How can you optimize query performance when using Amazon Redshift?

November 24, 2023 - Identify the table and columns usage for each operation along with row volume and data size etc. 2) Workload Management (WLM): Configure WLM to prioritize critical queries and manage system resources effectively.

e6data.com › query-and-cost-optimization-hub › how-to-optimize-aws-redshift-queries

AWS Redshift Query Optimization Guide 2025: 15 Code Hacks and Examples

September 16, 2025 - Redshift performs optimally when related data is co-located on identical compute nodes, eliminating expensive cross-node data movement. Implementation example for sales analytics dashboards: 1-- Original table with default distribution 2CREATE TABLE sales_facts ( 3 sale_id BIGINT, 4 customer_id ...

repost.aws › questions › QUL3LDRSdSR8GgJIlKavWyGA › redshift-automatic-table-optimization-and-table-swaps

Redshift Automatic Table Optimization and table swaps | AWS re:Post

December 16, 2020 - More details. Redshift Automatic Table Optimization (ATO) uses the same mechanism as Redshift Advisor for sort and distribution key recommendations. With ATO, all recommendations are recorded in the SVV_ALTER_TABLE_RECOMMENDATIONS system table.

stackoverflow.com › questions › 78272348 › which-sortkey-is-redshifts-automatic-table-optimization-using-for-a-table-with

sorting - Which sortkey is Redshift's automatic table optimization using for a table with auto sortkey set? - Stack Overflow

One way to attack this is to look at the block metadata for the table and see what columns are in sort order. This assumes that the tables is sorted (vacuumed) and analyzed. STV_BLOCKLIST contains this data - see: https://docs.aws.amazon.com/redshift/latest/dg/r_STV_BLOCKLIST.html

Since this table is very detailed I recommend only looking at one slice and pivot the data so that each column's data is a column. You should see that only one column has non-overlapping max and min values for its metadata.

On a side note: While Redshift can do a fair job of picking a table sort key after some time running actual production queries, it won't fully optimize the table for your query load. It won't make stupid choices and it not a bad place to start if you have no usage info but it likely won't get the most out of your cluster.

Amazon Web Services

pages.awscloud.com › rs › 112-TZM-766 › images › Optimize your Data Warehouse Performance with Amazon Redshift Autonomics.pdf pdf

Amazon Redshift Autonomics

Redshift Distribution Styles · • Automatic Table Optimization · • Distribution Styles and Keys · Additional Documentation · © 2023, Amazon Web Services, Inc. or its affiliates. Redshift Sorting · • · When data is initially loaded into an · empty table, the rows are stored on ·

hevodata.com › home › learn › amazon redshift performance tuning: 4 best techniques

Amazon Redshift Performance Tuning: 4 Best Techniques - Learn | Hevo

December 29, 2022 - When Amazon Redshift organizes ... Amazon Redshift to ignore entire blocks of data that do not suffice to your filtering/predicate range. Before executing any query, the optimizer redistributes the rows to the compute nodes to perform joins, aggregations, and processing. A few techniques for optimal distribution style are listed below: To minimize the impact of this redistribution, you must designate the primary key of the dimension table ( and the ...